01:00 mupuf: In case someone wondered, hakzsam tested ssbo/atomics on maxwell GM1xx and it worked
01:00 mupuf: hakzsam: tested dEqp too?
01:00 hakzsam: nope
01:00 hakzsam: only piglit
01:00 hakzsam: that should be enough
02:20 karolherbst: hakzsam: do you plan to do something on reator today?
02:21 hakzsam: karolherbst, yeah, this afternoon, but have fun I don't use it right now :-)
02:21 karolherbst: hakzsam: nah, currently I don't need it too, have to craft some vbios files together which will take me some time actually
02:22 hakzsam: karolherbst, okay
03:39 jscinoz: Hi, so I have a GTK 960MC (Maxwell NVC0). I've been trying to do a mmiotrace to dump the firmware, but I'm having a bit of trouble. I can get the binary driver working correctly, but when I try to load the nvidia kernel module with mmiotrace active it has a kernel oops and attempting to start xorg hangs indefinitely. Given that this only happens when mmiotrace is active, I wonder if nvidia is starting to
03:39 jscinoz: take countermeasures against firmware dumping?
03:42 mwk: jscinoz: I'd guess it's mere accitent.... mmiotrace is not really rock-solid
03:43 jscinoz: mwk: ah, any pointers on what I can try? I've tried a few different versions of the nvidia driver and all unfortunately have the same outcome
03:43 karolherbst: ohh you could try my patch
03:43 jscinoz: I've tried with the latest in my distro (gentoo): 361.18-r4, and also 340.96
03:44 mupuf: karolherbst: I was about to say so :D
03:44 karolherbst: jscinoz: https://gist.githubusercontent.com/karolherbst/903bf75486134dd9505d/raw/6f97c12da078f8ea3d9e3a62241d12a103f22fab/mmiotrace.patch
03:44 karolherbst: mupuf: did you try it out yet?
03:44 jscinoz: thanks karolherbst, let me see if i can apply this
03:44 karolherbst: mupuf: ohh regarding the repeat issue, we could simply rollback the kernel patch for this one I guess
03:44 mupuf: karolherbst: nope, yesterday evening, I finally wrote the list of changes in DRM for linux 4.4
03:44 mupuf: yes, I was terribly late :s
03:45 karolherbst: jscinoz: well I did this on top of linux 4.4
03:45 mupuf: the repeat issue?
03:45 karolherbst: yeah
03:45 jscinoz: alrighty, it applied cleanly
03:45 karolherbst: but I didn't check what causes this
03:45 jscinoz: karolherbst: yep, i'm on 4.4
03:45 jscinoz: building kernel now
03:45 mupuf: karolherbst: sorry, ENEEDMORECONTEXT
03:46 karolherbst: mupuf: the repeate instruction is called inside kernel assembly
03:46 karolherbst: *repeat
03:46 mupuf: oh, I did not follow your discussion enough to have a clue
03:46 mupuf: I understood what you explained (related to huge pages)
03:46 karolherbst: yeah it is unrelated
03:47 jscinoz: with this patch, when trying to mmiotrace, should i used the latest nvidia driver, or one from the 340.XX series (the wiki mentioned the latter)?
03:47 karolherbst: jscinoz: doesn't matter
03:47 jscinoz: karolherbst: Alright, thanks, will let you know how it goes :)
03:47 karolherbst: but you may want to use the latest one generally
03:47 karolherbst: normally the newer drivers should do things better (tm)
03:47 jscinoz: yeah, i'd prefer to use the latest personally, just wasn't sure if that could be related to the issues when loading the driver while mmiotrace is active
03:49 karolherbst: mhh no idea what started to cause the mmiotrace issue though
03:50 karolherbst: mupuf: I am sure I had the issue somewhen earlier by the way, so maybe it is something volative like changes in the mtrr/pat code
03:50 karolherbst: *volatile
03:50 jscinoz: Ah, so I'm not the only one having this problem?
03:50 karolherbst: no
03:50 karolherbst: I also have it
03:50 karolherbst: for some reasons
03:50 jscinoz: I see
03:50 karolherbst: but I was able to do mmiotraces some months earlier
03:50 karolherbst: and before that I also had the issue at some point
03:50 karolherbst: so no clue why
03:51 mupuf: stars not aligning :)
03:51 karolherbst: I just figured out what was wrong with the mmiotrace code
03:51 karolherbst: :D
03:51 mupuf: or, in this case, memory pointers
03:51 karolherbst: mhhh
03:51 karolherbst: no, don't think so
03:51 karolherbst: the pointers are the same usually
03:51 karolherbst: I could imagine that linux sometimes changes how it does ioremaps
03:51 jscinoz: Awesome :D I'd love to get into kernel dev someday, but I don't even know C thus far; i'm a java/nodejs and most recently rust guy
03:52 mupuf: I thought it was an alignment issue?
03:52 karolherbst: yeah, but you know, my working traces had the same ioremap range sizes
03:52 mupuf: hehe, can't talk right now though
03:53 karolherbst: jscinoz: while mmiotracing it might take a while until X shows up, but it shouldn't take more than 2 minutes or something
03:54 karolherbst: and shouldn't hang
03:54 jscinoz: karolherbst: Noted, going to reboot now to try it out, will report backin 10mins or so
04:06 jscinoz: Same result unfortunately. It seems the system doesn't hang in that i can still ssh in, but i can no longer switch to a VT or do anything else on the machine diretly
04:06 jscinoz: dmesg when trying to mmiotrace, with that patch: https://gist.github.com/anonymous/94352a6a4ec21670765d
04:06 jscinoz: same thing happened without the patch, however
04:15 pmoreau: jscinoz: FYI, Maxwell is GM10x and GM20x, but NVC0 (aka. GF1xx) is Fermi, so Maxwell NVC0 should not exist
04:16 pmoreau: And I doubt the 340.xx series has support for Maxwell cards, even more 2nd Maxwell, but I could be wrong
04:22 jscinoz: pmoreau: Huh, i think the wiki is wrong in that case
04:23 jscinoz: wait sorry, its not nvc0
04:23 jscinoz: but it is maxwell
04:23 karolherbst: ohhhh
04:24 karolherbst: that is a different issue
04:24 jscinoz: https://wiki.freedesktop.org/nouveau/CodeName its nv117/GM107
04:24 pmoreau: Sounds better ;-)
04:24 jscinoz: card is 960M
04:24 jscinoz: karolherbst: ah, so the kernel patch isn't applicable?
04:24 karolherbst: not for this one, well you still could get the other issue as well, but this is something odd
04:24 pmoreau: I had a look at the latest release of 340.xx (340.96), and it doesn't support the 960M, so you should rather use the latest version.
04:24 karolherbst: it happened to me too, but only like once
04:25 jscinoz: pmoreau: Ah, noted. the dmesg log above was with the latest nvidia-drivers in gentoo which are 361.18-r4
04:25 jscinoz: which is*
04:25 pmoreau: Ok
04:25 karolherbst: well the latest is 361.28 now ;)
04:25 karolherbst: but shouldn't matter for this card
04:25 jscinoz: ah, will resync anyway though
04:26 pmoreau: I've hear about a NULL ptr dereference with recent versions of the NVIDIA driver, but… can't remember where
04:26 pmoreau: s/hear/heard
04:28 karolherbst: ohhhh
04:28 karolherbst: that __down function is inside linux/list.h
04:29 karolherbst: ohh wait
04:29 karolherbst: that doesn't make much sense
04:30 karolherbst: jscinoz: wanna enable CONFIG_DEBUG_LIST in the kernel?
04:30 karolherbst: maybe this will give us a more detailed description of the issue
04:34 jscinoz: karolherbst: will do, one moment
04:34 jscinoz: Okay, building
04:39 drathir: [vo/opengl} after rendering Opengl error INVALID OPERATION
04:40 drathir: any ideas about that error?
04:42 drathir: with sudo looks like workin...
04:43 drathir: user even added to audio and video group...
05:02 drathir: try playing throught mpv
05:02 drathir: but smplayer looks like workin..
05:05 jscinoz: okay, rebooting to get log with the extra config option + latest nvidia blob
05:18 jscinoz: karolherbst: https://gist.github.com/anonymous/05a24e915bdd353766e4 is dmesg with CONFIG_DEBUG_LIST enable
05:27 jscinoz: Unfortunatey, I must turn in now as its getting rather late down here. Should be around tomorrow evening to continue looking into this issue. thanks for all your help :)
05:32 drathir: looks like mplayer have problems play vdpau but no idea why with sudo works...
05:33 drathir: now somethin new [vo/opengl/vdpau] Before uninitializing OpenGL interop: OpenGL error INVALID_OPERATION.
05:33 mlankhorst: do you compile with --shared-glapi ?
05:35 drathir: its arch mpv let me check their pkgbuild script...
05:38 drathir: --enable-zsh-comp \ --enable-libmpv-shared \ --enable-cdda only see...
05:39 drathir: mplayer looks like workin from user and mpv with root only...
07:05 drathir: te strangest thing is that user not work sudo workin thats looks like some kind of access permissions issue...
07:06 drathir: now little more info https://gist.github.com/54a56884ecf18eae39ba
07:07 drathir: but honestly no idea what that mean ;/
07:07 drathir: if that is mpv or nouveau issue connected...
08:14 imirkin: drathir: mpv + vo=gl,hwdec=vdpau won't work on nouveau, without substantial changes in nouveau.
08:20 imirkin: jscinoz: the firmware won't be in the mmiotrace, sorry
08:51 karolherbst: how....
08:51 karolherbst: mupuf_: that driver is just plain crazy
08:52 karolherbst: I just whiped out the entire cstep table except the first entry and set it as the max for every pstate, result: the driver still reclocks just fine
08:53 imirkin: probably still reads from acpi
08:53 karolherbst: on reator?
08:53 imirkin: oh, no.
08:54 karolherbst: I guess the driver has a smart fallback table or something like that
08:54 karolherbst: when something is fishy: fallback to tables generated on the fly
08:59 karolherbst: well then I confirm the fan_mgmt table then
09:03 karolherbst: mupuf_: yep, the fan_mgmt table is kind of doing what I thought it does: 40° 30 "duty": gets me 0x154 duty with 0x438 div: 0.314, close enough
09:04 karolherbst: 80° has 45 duty: gets me 0x1e0 with 0x438 div: 0.444
09:06 karolherbst: 95° has 100 duty: gets me 0x406 (increeasing) with 0x438 div: 0.953
09:07 karolherbst: well the gpu is at 0x419 now at 95°
09:15 karolherbst: ohhhh
09:15 karolherbst: weird
09:16 karolherbst: imirkin: is it normal that the vbios I put on the card via pramin differes from the one I fetch via pramin?
09:16 imirkin: no.
09:17 karolherbst: and with differs I mean the read out one shifted
09:17 karolherbst: bunch of 0 bytes at the begining
09:17 karolherbst: at 0x5004 the vbios begins
09:17 imirkin: weird.
09:17 imirkin: don't think i've seen that.
09:17 karolherbst: its a gm107 one
09:18 karolherbst: ohh wait
09:18 karolherbst: no
09:19 karolherbst: everything up to 0x5004 is overwritten by 0 :/
09:19 karolherbst: ohhhh
09:19 karolherbst: and 0x619f04 is 0x1
09:19 karolherbst: just fine
09:20 karolherbst: mupuf_: your maxwell also has broken fakevbios
09:20 karolherbst: don't know if you knew
09:20 imirkin: probably it's a secondary gpu and the first time you loaded nvidia it ran the bios tables
09:23 karolherbst: nope, same after clean reboot
09:24 karolherbst: I think we have to fix nvafakebios for real, because currently the vbios is written into some memory region not claimed by anybody
09:24 karolherbst: with funny side effects
09:44 imirkin: karolherbst: what makes you say GK110 is the cutoff? did you have Tom^ check?
09:44 karolherbst: yeah I checked this on toms card
09:44 karolherbst: I found this out while doing the fsrm stuff
09:44 karolherbst: and on his card it was all different
09:45 imirkin: ah :)
09:45 karolherbst: but there is more in those
09:46 karolherbst: THRESHOLD_UNK30C might be also the critical threshold, but I never dig into that deeper
14:32 jscinoz: mm, it seems i now get that null dereference even without trying to mmiotraace
15:34 jscinoz: Alright, finally managed to capture a mmiotrace of the binary driver :D
15:34 jscinoz: Had to set up the mmiotracing in initramfs; it seemed if i did it later the nvidia module would have an OOPS when loading
15:39 Javantea: jscinoz: interesting, what did the oops look like?
15:41 jscinoz: Javantea: https://gist.github.com/anonymous/94352a6a4ec21670765d
15:42 jscinoz: So, I used demmio with the perl script from https://wiki.freedesktop.org/nouveau/NVC0_Firmware/ this page, it complained about "I don't know which chipset variant to use!" a few times, but it did still produce a number of output files
15:42 jscinoz: i renamed them to nv117_XXXXX and put them in /lib/firmware/nouveau; then modprobed nouveau
15:42 jscinoz: not sure if it worked; there's no output of any kind from nouveau in dmesg
15:43 imirkin_: jscinoz: if you have a GM107, the firmware in kernel 4.1+ should be fine...
15:44 jscinoz: Ahaha, really?
15:44 jscinoz: So i should just load it normally and 3d will work? I note the main page on the wiki says GM107 acceleration was merged in 4.1
15:44 jscinoz: and i'm on 4.4.1
15:45 imirkin_: work is a relative term, but yes
15:45 imirkin_: unfortunately we've collectively been too lazy to add exa support for maxwell, so you're stuck with glamor
15:46 imirkin_: glamor, in turn, triggers bugs in nouveau, so you get bad font rendering
15:46 jscinoz: Hmm, let me see if I can get portage to actually install xf86-video-nouveau with glamor enabled, it's not setting the flag for some reason
15:50 jscinoz: Ah, i guess it's the default now, that flag only existed on 1.0.11; it's not present on the later ebuilds
15:51 imirkin_: glamor is gone entirely, nouveau won't load for maxwell - you end up using the modesetting driver
15:52 jscinoz: Oh..
15:52 imirkin_: because the glamor integration in nouveau was broken, and even if it were fixed, that'd just be identical to using modesetting, so... dropping it seemed like the right move
15:52 jscinoz: So, if I want to actually use the nvidia gpu in this machine, I have to use the binary driver?
15:52 imirkin_: (modesetting has a working glamor integration)
15:52 imirkin_: huh?
15:53 jscinoz: oh okay, i thought glamor was something mandatory for it to work
15:53 imirkin_: i meant xf86-video-nouveau won't load for maxwell
15:53 imirkin_: it is.
15:53 imirkin_: and it will work "fine" with xf86-video-modesetting
15:53 jscinoz: but with xf86-video-modesetting (or xf86-video-intel - it's an optimus setup), would the nvidia card still be available as an offload provider in xrandr?
15:53 imirkin_: should be, yea
15:54 imirkin_: if you're looking to just use it for 3d, then you can just get the intel ddx going with dri3
15:54 imirkin_: and not worry about all this ddx stuff
15:54 imirkin_: if there are screens on it you want to access, you need to get all this going
15:54 jscinoz: Thankfully nope, on this one all the displays are connected to the intel GPU
15:54 jscinoz: my last machine had some on the nvidia one which was a bit of a pain
15:54 imirkin_: note that there's no reclocking, so it'll probably be slower than the intel gpu
15:55 jscinoz: Oh, that's a shame
15:55 jscinoz: i'll give it a try anyways, for curiosity's sake
15:55 imirkin_: sure
15:55 jscinoz: so, just to clarify, in a prime setup, the intel driver needs to be in dri3 mode?
15:55 jscinoz: it won't work if its in the default dri2 mode?
15:55 imirkin_: needs? no. but i'd recommend it.
15:56 imirkin_: if you're using dri2, you need to take more steps. but it should work too.
15:56 jscinoz: Alright, i'll go look up how to put it in dri3 mode
15:56 imirkin_: build the intel ddx with --enable-dri3
15:56 imirkin_: and then stick Option "DRI" "3" into your xorg.conf
15:58 jscinoz: Is there a way to check what flags the driver was built with?
15:59 imirkin_: not sure
16:00 jscinoz: i'll check the build logs. I think it probably has it, as the ebuild has a dep on dri3proto
16:00 imirkin_: gentoo force-disables it
16:01 jscinoz: Ah, i'll see if i can make it do it with extra_econf
16:02 jscinoz: okay, yep it's passing it to configure now
16:07 jscinoz: Alright, so I've got DRI3 enabled for the intel driver, but xrandr --listproviders only shows the intel card - there's no provider from nouveau
16:11 jscinoz: ah wait i see, i don't need to use xrandr with DRI3
16:11 imirkin_: that's right
16:11 imirkin_: like i said, fewer steps :)
16:12 drathir: imirkin_: but why the mplayer+vdpau workin? the same mpv+vdpau runnin with sudo?
16:15 imirkin_: drathir: no clue
16:15 imirkin_: drathir: probably mplayer + vdpau doesn't actually end up using vdpau for decoding?
16:15 jscinoz: I think i might need to rebuild mesa or something. DRI_PRIME=1 glxinfo | grep vendor still shows intel
16:15 imirkin_: drathir: does vdpauinfo say that hw decoding is supported?
16:15 imirkin_: jscinoz: what version of mesa do you have? and was it built with dri3 support?
16:16 jscinoz: imirkin_: 11.1.1 and yes
16:16 imirkin_: jscinoz: LIBGL_DEBUG=verbose DRI_PRIME=1 glxinfo |& head
16:16 drathir: imirkin_: but its reported as hw accelerated also is visable cpu down usage... thats a little mystery become...
16:16 jscinoz: Huh, |&, never seen that token before
16:16 imirkin_: same as >&
16:16 imirkin_: but with pipe
16:17 imirkin_: drathir: ok, dunno
16:17 imirkin_: drathir: not sure i care to figure it out :)
16:17 jscinoz: I see
16:17 imirkin_: jscinoz: pastebin the result of that
16:17 jscinoz: imirkin_: https://gist.github.com/anonymous/34a2039969a2e0e2703c
16:17 drathir: imirkin_: its 8600gts yes vdpauinfo and vainfo report support of hw acceleration...
16:18 imirkin_: jscinoz: hmmmmm... so you have DRI3, that's good. is nouveau loaded?
16:18 imirkin_: jscinoz: ls -l /dev/dri
16:20 jscinoz: imirkin_: https://gist.github.com/anonymous/85097c12094e8c5790a9
16:20 imirkin_: jscinoz: nouveau didn't load properly then... pastebin dmesg?
16:21 jscinoz: imirkin_: https://gist.github.com/anonymous/6345bdc6ad769d334220
16:21 jscinoz: no mention of nouveau in it at all
16:21 imirkin_: i don't see any mentions of nouveau in there
16:21 imirkin_: perhaps you have nouveau.modeset=0 somewhere?
16:21 imirkin_: blob drivers like to throw that in behind your back
16:21 jscinoz: Ah one sec
16:21 jscinoz: ah yep, that's in my cmdline
16:22 imirkin_: you'll want to remove that :)
16:22 imirkin_: modeset=0 means "don't do anything"
16:22 jscinoz: will rebuild kernel and retry (EFI system so cmdline has to be hard-coded in kernel unfortunately)
16:22 imirkin_: lol
16:22 imirkin_: use gummiboot
16:22 imirkin_: (that's what i do)
16:22 jscinoz: iirc last time i tried without, it resulted in no console
16:22 jscinoz: ah will try that out later on; this works for now and i don't want to break things :P
16:23 imirkin_: did you have your keyboard plugged into a monitor usb, which in turn was plugged into the actual usb port?
16:23 imirkin_: coz that config greatly upset gummiboot
16:23 jscinoz: imirkin_: no, it's a laptop - no external peripherals
16:23 imirkin_: ah
16:24 jscinoz: should i build nouveau into the kernel too, or leave it as a module?
16:24 imirkin_: i would recommend leaving it as a module
16:24 imirkin_: in fact, you can unload it now
16:24 jscinoz: alright, will do
16:24 imirkin_: and reload it with modeset=1
16:24 jscinoz: should it be in the initrmfs?
16:24 imirkin_: which should override the kernel parameter
16:24 imirkin_: dunno, depends on what you put into initramfs
16:24 imirkin_: my initramfs has no kernel-dependent items
16:25 imirkin_: just enough to decrypt the disk and keep going
16:25 jscinoz: Ah yep, i just found that i was having issue getting a console at all to enter decrpytion passphrase
16:26 jscinoz: i'll leave it out for now and see how it goes
16:26 drathir: imirkin_: always nouveau shouldnt be placed into mkinitcpio modules section?
16:26 imirkin_: drathir: depends how your system is set up
16:26 imirkin_: drathir: if you're doing a distro-styel build, yes
16:27 drathir: imirkin_: thanks good to know, didnt know that...
16:30 imirkin_: drathir: i update my initramfs once every... never.
16:30 imirkin_: once i made it and it worked, i think i updated it once for something dumb, and that was it
16:49 jscinoz: imirkin_: That fixed it, glxinfo shows the correct vendor now when DRI_PRIME=1
16:50 imirkin_: cool
16:50 imirkin_: if you're interested in helping improve the driver on maxwell, let me know
16:51 imirkin_: so far i've not really cared about it because GM20x is out of reach for now
16:51 jscinoz: Yeah, how can I help?
16:51 jscinoz: Unfortunately, as you said, performance is pretty poor and there's some hilarious font & texture issues in games
16:51 imirkin_: what's your (relevant) skillset?
16:52 jscinoz: not much unfortunately, i'm a dev, but not really with low level stuff. java/nodejs and a tiny bit of rust
16:52 jscinoz: fairly strong on the sysadmin/debugging(ish) side of things at least
16:52 imirkin_: how about opengl?
16:53 jscinoz: Never worked with it from a dev perspective unfortunately; basially all my work is just backend/server stuff
16:53 imirkin_: k
16:53 imirkin_: well, if you're up for a very steep learning curve, i can probably assist
16:54 jscinoz: I cna give it a go at least, where should I start with things?
16:54 imirkin_: well... try to figure out what's giong wrong with the applications you tried
16:54 jscinoz: Ah right, let me just try something native first, because the first thing i tried was via wine
16:55 jscinoz: and while it does render properly with the intel driver, probably complicating things needlessly
17:04 jeremySal: imirkin: I have some dumps for the fragment_shader_interlock extension
17:12 imirkin_: jeremySal: cool
17:14 jeremySal: http://columbia.edu/~jas2312/fragment_shader_interlock.tar.xz
17:14 imirkin_: i'll check it out tonight
17:14 jeremySal: what does $affinity mean in assembly?
17:15 imirkin_: never heard it
17:15 imirkin_: where do you see that?
17:15 jeremySal: demmt put it out
17:15 imirkin_: paste the whole line?
17:15 jeremySal: 01c70000 f0c80000 C mov $r0 $affinity
17:15 jeremySal: it's blue
17:15 imirkin_: oh, it must be a special register
17:15 imirkin_: which nvdisasm called "affinity"
17:15 imirkin_: i haven't the faintest clue what it might be
17:15 jeremySal: oh cool
17:16 imirkin_: doesn't seem like it existed on earlier chips
17:17 jeremySal: also $thread_kill and $laneid
17:17 jeremySal: have you seen those before?
17:17 imirkin_: ya
17:17 imirkin_: thread_kill is 1 or 0 depending on whether that invocation is "live"
17:17 imirkin_: it might not be live if (a) you did a discard or (b) it's a helper invocation
17:17 imirkin_: laneid is ... the id of the lane ;)
17:18 imirkin_: like if you have 32 invocations running in parallel
17:18 jeremySal: what is the vote instruction?
17:18 imirkin_: each one of them get a different number
17:18 jeremySal: yeah that makes sense
17:18 imirkin_: vote compares the value of a predicate across all lanes
17:18 imirkin_: and stores the aggregate result somewhere
17:18 imirkin_: so like you can do "vote all" or "vote any"
17:18 imirkin_: or some third one
17:19 jeremySal: I see
17:19 jeremySal: the ptx documentation lists the instructions but I can't seem to find detailed information about an individual instruction
17:22 imirkin_: well, PTX isn't the same thing
17:22 jeremySal: yeah I know
17:22 imirkin_: PTX is a fake-o ISA which kinda-sorta maps onto the real thing
17:22 jeremySal: I just don't have the best assembly experience
17:22 imirkin_: but it can have valuable info often
17:22 jeremySal: so something like mov32i, it makes me wonder what the third argument is
17:23 imirkin_: yeah, nfc
17:23 imirkin_: blob always uses 0xf there
17:23 imirkin_: so do we :)
17:23 imirkin_: problem solved.
17:23 jeremySal: lol
17:23 imirkin_: arguably it's not even an extra arg
17:23 imirkin_: it *could* be a predicate maybe? but that'd be odd too...
17:24 jeremySal: $p0 fadd32i cc $r63 abs $r7 0xfd801fe2
17:24 jeremySal: would be a predicated add
17:24 imirkin_: that's a dirty lie
17:24 imirkin_: in theory yes
17:24 imirkin_: in practice that's a misinterpreted sched line
17:24 imirkin_: remember, every 4th instruction should be a sched
17:24 jeremySal: ooohh
17:24 jeremySal: that's evil
17:24 imirkin_: a little yeah :)
17:25 jeremySal: what does cc mean?
17:25 jeremySal: and abs in that context
17:25 jeremySal: or is that just nonsense
17:25 imirkin_: set carry
17:25 imirkin_: (or technically, condition code)
17:25 jeremySal: what about abs? absolute value?
17:25 imirkin_: yep
17:26 imirkin_: this isn't your average cpu's ISA :) tons of modifiers for all this junk
17:26 jeremySal: so how does that interact with carry flag?
17:26 imirkin_: actually, i'm not sure what "cc" does in the context of fadd
17:27 jeremySal: wait, it's a float addition
17:27 imirkin_: might just get set when the result is non-0
17:27 imirkin_: mwk: do you know what CC does for floats?
17:27 jeremySal: is abs() taken after the addition or before it?
17:27 imirkin_: the abs applies to the argument
17:28 jeremySal: the float or the integer?
17:28 mwk: imirkin_: by CC you mean the $c register?
17:29 imirkin_: mwk: the thing that does like R0.CC in nvdisasm output. i think $c yeah
17:29 mwk: basically, carry and overflow flags are never set
17:29 mwk: zero flag is set when result is zero or NaN, sign flag is set when result is <0 or NaN
17:29 imirkin_: mwk: are those set separtely on fermi+?
17:29 mwk: huh?
17:30 jscinoz: imirkin_: interestingly of the native linux applications i've tried with DRI_PRIME=1, all simply hang with a black screen
17:30 imirkin_: mwk: well i know there's the funny condition registers on nv50... wasn't sure hwo it worked on nvc0+
17:30 jscinoz: whereas a game (WoW) in wine worked, but with font rendering issues
17:30 imirkin_: jscinoz: try glxgears?
17:31 jscinoz: will do, still trying to get openarena to actually exit
17:31 jscinoz: it didn't even die from a kill -9
17:31 mwk: imirkin_: g80 has $c0-$c3 and that's it; gf100 has $flags, which has two things in it: 4-bit $c (working pretty much like any of the G80 $cX registers), and single-bit $p0-$p6
17:31 jscinoz: oh there we go, it just took a while to do it
17:31 jscinoz: imirkin_: glxgears also just gets a black window
17:32 mwk: on g80, $cX were apparently often used as predicates, so on Fermi they replaced it with proper single-bit predicates, and only left the singular $c
17:32 jscinoz: and seems near frozen - takes 10-30 seconds to respond to ctrl-c (or even kill -9)
17:32 mwk: which is mostly used for multi-precision integer ops, but if you really want there are some opcodes that use it for comparisons, logic ops, etc.
17:33 drathir: imirkin_: yea mostly touch it when first os install and in situation i need made remote unlock of hdds...
17:33 imirkin_: jscinoz: weird. anything in dmesg?
17:33 imirkin_: mwk: right... but you can do .CC on even fp ops
17:34 imirkin_: which is weird
17:34 mwk: yup, as I said - it still has all the g80 features, except nobody seems to use it
17:34 drathir: jscinoz: if You like node take a look on cjdns ^^
17:35 mwk: oh, and only a handful of ops can be predicated with $c, while most can be predicated with $p
17:35 mwk: schizo regs.
17:35 jscinoz: imirkin_: nouveau 0000:01:00.0: openarena[21061]: failed to idle channel 2 [openarena[21061] twice
17:36 jscinoz: drathir: Ah yep, I used to have a node on hyperboria a while back, but I kind of forgot about it, unfortunately
17:36 imirkin_: mwk: some sort of docs around that would be *great* :)
17:36 jscinoz: imirkin_: same thing in dmesg for glxgears also
17:36 imirkin_: jscinoz: nothing before that?
17:37 jeremySal: imirkin: what does it mean if there is just an exit instruction in the middle of a block of code with no flow control before it? How is the rest of the code reached?
17:37 jscinoz: imirkin_: not do do with glxgears, but from other things i ran, let me pastebin full dmesg
17:38 drathir: jscinoz: xonotic as game on linux You can also check...
17:38 imirkin_: jeremySal: pastebin :)
17:38 mwk: imirkin_: it's on my TODO list
17:39 jscinoz: imirkin_: https://gist.github.com/anonymous/d8cc824d8923d222c0e4
17:39 jscinoz: drathir: yep, i'll install that and give it a try
17:39 drathir: jscinoz: wow nice to hear that...
17:40 imirkin_: jscinoz: hm, sad. dunno =/
17:40 imirkin_: jscinoz: skeggsb might be better placed to provide help
17:43 jeremySal: imirkin: http://pastebin.com/xLLasbDY
17:44 jscinoz: imirkin_: Ah, fair enough, thank you for all your help :)
17:47 jscinoz: It's so odd that a WINE game can run with PRIME (albeit with font issues), yet anything native fails to run at all
17:47 jscinoz: oh, strange, now that's not running either... I'm going to reboot; something odd is going on here
17:48 imirkin_: jeremySal: cal = call
17:49 jeremySal: imirkin: but it has noarguments?
17:50 imirkin_: yeah
17:50 jeremySal: how does it specify what it's calling?
17:50 imirkin_: calls never have arguments
17:50 imirkin_: oh. like "cal 0x58"
17:50 imirkin_: means "jump to 0x58" :)
17:50 jeremySal: yeah, but I'm not seeing an address
17:50 imirkin_: 0x58 is the address.
17:50 jeremySal: yeah but in the example demmt
17:51 jeremySal: it does not have something like "0x58"
17:51 imirkin_: 00000058: 01c70000 f0c80000 C mov $r0 $affinity
17:51 imirkin_: 00000008: 04800040 e2600000 cal 0x58
17:51 imirkin_: yes?
17:51 jeremySal: huh? i just don't see 0x58
17:51 imirkin_: 00000058
17:51 jeremySal: oh my god
17:51 jeremySal: the gamma on my monitor
17:51 jeremySal: or something
17:51 imirkin_: uh huh
17:51 jeremySal: it's /perfectly/ invisible
17:51 jeremySal: the address
17:52 jeremySal: it just looks like cal to me
17:52 imirkin_: likely excuse
17:52 jeremySal: I actually can't get it to show anything there
17:53 imirkin_: well i'm just looking at your paste.
17:53 jeremySal: ok I swear this is a bug
17:53 jeremySal: yeah, I think it's printing in white text
17:53 imirkin_: it should be bright white
17:53 jeremySal: yeah but my background is bright white
17:53 jeremySal: ooooh
17:53 imirkin_: then that's a problem :)
17:54 imirkin_: i don't think envydis is smart enough to take current settings into account
17:54 jeremySal: haha okay
17:54 drathir: jscinoz: wine-staging using?
17:54 imirkin_: but who sues a white background anyways
17:54 jeremySal: I'm colorblind, but not *that* colorblind
17:54 jeremySal: imirkin: weirdo's mostly
17:54 imirkin_: not like it's the default for xterm :)
17:54 jeremySal: *weirdos
17:55 imirkin_: good thing red-blue color blindness isn't a thing
17:55 imirkin_: otherwise there'd be a lot more RGBA vs BGRA bugs
17:55 jeremySal: so, I guess I should ask
17:55 jeremySal: are there red vs green things in the output?
17:55 jeremySal: the registers took different than the constants
17:56 jeremySal: I'm guessing registers=red constants green?
17:56 imirkin_: probably, but the actual colors aren't important
17:56 imirkin_: iirc those are all blue, but i don't remember
17:56 drathir: jscinoz: 3d fun?
17:56 drathir: jscinoz: tabfail...
17:56 drathir: jeremySal: ^
17:57 jscinoz: drathir: yeah, wine1.8 with the staging patches. going to reboot momentarily though, since now not even that is working with DRI_PRIME=1, so i suspect maybe my card needs to be reinitialized or something
17:58 jeremySal: wait, what is $r0?
17:58 imirkin_: jeremySal: https://github.com/envytools/envytools/blob/master/util/colors.c#L45
17:58 imirkin_: $r0 is register 0
17:58 imirkin_: there are 255 registers, although $r255 is reserved in that it alwasy contains the value 0
17:59 jeremySal: ok because $r0 is definitely not blue to me
17:59 imirkin_: it's intel-style syntax
17:59 imirkin_: so mov $r0 $affinity <-> $r0 = $affinity
17:59 jeremySal: yeah that's what I expected
17:59 imirkin_: reg is color 31 :)
18:00 jeremySal: I switched to a different theme where the address is now off-white instead of pure white :)
18:01 jeremySal: ooh, and now I see that the address of like 0x58 is colored on that line as well, I was confused about that as well.
18:02 jeremySal: what are the s registers?
18:03 mwk: where do you see an s register?
18:03 jeremySal: mov $r2 $s15
18:03 jeremySal: When using the shader_interlock extension
18:03 mwk: is it annoyingly bright red? seems like an unknown special register
18:03 jeremySal: http://pastebin.com/xLLasbDY
18:04 jeremySal: I'm colorblind, so annoying bright red looks the same as red/brown to me
18:04 jeremySal: ohh, hmm no actually I'm pretty sure it doesn't
18:04 mwk: these have individual names based on the purpose, so you should never see one unless it's an unknown
18:05 mwk: yeah, it's an unknown special register
18:05 mwk: if you have some idea what it is, patches welcome
18:05 jeremySal: ok cool
18:11 mwk: ugh
18:12 mwk: we really need to sync names between g80/gf100/gk110/gm107
18:12 mwk: it's $srX on g80/gf100/gk110, but $sX on gm107
18:14 mwk: and the actual sr names too, for that matter
18:15 jeremySal: so in a fragment shader, if you access the register $r4 before it's written to, what value will it have?
18:16 jeremySal: nevermind I realized that's not happening
18:16 mwk: jeremySal: undefined
18:17 mwk: whatever the previous shader left in there
18:17 jeremySal: ok thanks
18:17 mwk: except the registers are not actually fixed, but mapped to actual hw memory via a complex mapping
18:17 mwk: so your $r4 might be some previous vertex shader's $r9
18:17 jeremySal: oh that's interesting
18:17 mwk: in other words - totally indeterminate
18:18 mwk: yeah well
18:18 mwk: each processor has a huge RAM used as backing storage for $r registers
18:19 jeremySal: oh they're not actually registers?
18:19 jeremySal: i mean like they're not part of the core?
18:19 mwk: when it wants to run a shader block (a warp), it waits until there's enough space in that RAM, and then allocates a chunk of the right size and launches it
18:19 jeremySal: How do they lay it out so ram access is fast?
18:20 mwk: "actually registers" is a fuzzy concept
18:20 mwk: registers are RAM
18:20 jeremySal: yeah for sure
18:20 mwk: and the $r backing RAM is right there inside the processor
18:20 jeremySal: I guess I thought of registers as the fastest memory
18:20 jeremySal: where their access is hardcoded into the instruction set
18:20 mwk: crazy fast
18:20 mwk: the access path is also crazy wide
18:20 jeremySal: but it's shared between the cores?
18:21 mwk: no
18:21 mwk: it's per-core
18:21 jeremySal: then why do they not have fixed registers?
18:22 mwk: and the bus width to the register bank is 512 bits wide... and there are 4 banks per core... and that was on Tesla, the numbers are now probably 8× bigger
18:22 mwk: because they do a *lot* of stuff in parallel
18:22 mwk: a single core can have thousands of threads running
18:22 jeremySal: oh what?
18:23 jeremySal: I guess I don't really know about the architecture
18:23 jeremySal: I just assumed it was a bunch of really simple CPU-like cores
18:23 jeremySal: running the same code
18:23 jeremySal: but that's probably wrong
18:23 mwk: haha.
18:23 mwk: okay, bear with me
18:23 mwk: the GPU has between 1-16 processors
18:24 mwk: known as SMs (streaming multiprocessors), or MPs, or whatever
18:24 mwk: roughly corresponding to a CPU core, but only roughly
18:25 mwk: each of these has p to 48 "warps" running in parallel, similiar to how hyperthreading CPUs have 2 or 4 threads running in parallel - as in, every cycle the CPU decides on which thread to run
18:26 mwk: this is meant to hide the latency of instructions - when one warp is waiting for a load from memory, other warps can be busy on the processor
18:26 jeremySal: yeah
18:26 imirkin: mwk: yeah, ben did the gm107 one to be close to nvdisasm
18:26 mwk: and now the big trick - each "warp" is up to 32 threads, running in lockstep
18:26 imirkin: mwk: not a huge fan of how it turned out but... wtv
18:27 mwk: that works a bit like SIMD on a CPU, but only a bit
18:27 jeremySal: is it more or less powerful than SIMD?
18:27 mwk: because the GPU tries hard to give you the illusion of independent threads
18:27 mwk: in reality, the threads in a warp have to execute the same instructions
18:28 jeremySal: so this is where predicated instructions come in?
18:28 mwk: yes
18:28 mwk: this is meant to keep the threads executing the same path
18:28 mwk: if you do a conditional branch, and the threads in a warp don't agree on the direction, the warp will be split - some threads will become inactive, and will be resumed later once the first group finishes
18:29 mwk: which is why divergent conditional branches are expensive, and there are a *lot* of tricks that are meant to rejoin threads on a single execution path after such a split
18:29 mwk: so - you can have thousands of threads executing an an SM
18:30 jeremySal: I see, so it's not that every single instruction is predicated, you can just put the core in an "inactive" state
18:30 mwk: that means, if you want 8 GPRs for your program, you really have to have, say, 8k registers on the processor
18:30 jeremySal: what is an SM?
18:31 mwk: 03:24:02 mwk: known as SMs (streaming multiprocessors), or MPs, or whatever
18:31 mwk: but 8 GPRs is ridiculously few, some programs may require hundreds of GPRs
18:31 mwk: but 100 GPRs * 1000 threads is way too many to fit on an SM
18:32 mwk: so - they made GPR count a regulable parameter
18:32 jeremySal: what is a GPR?
18:32 mwk: if you choose 8 GPRs, you can run lots of threads in parallel, since they'll all fit in the 64k registers they actually have on the chip
18:32 mwk: general purpose register
18:33 jeremySal: ah
18:33 mwk: but if you choose 128 GPRs, the GPU will limit simultanously running threads (warps really), because otherwise it's run out of registers
18:33 mwk: this is why there has to be dynamic allocation of the GPRs in a huge RAM
18:34 mwk: oh, and for 3d work, each type of shader can have a different amount of needed GPRs, complicating the equation
18:34 mwk: vertex shader needs 8 GPRs, pixel needs 4, geometry needs 11
18:35 jeremySal: Okay
18:35 jeremySal: so all the types of opengl shaders are the same from the hardware point of view?
18:35 jeremySal: there aren't special processors for the different types?
18:35 mwk: of course not
18:35 mwk: the hardware does differentiate between them
18:35 mwk: but they do execute on the same processors
18:36 mwk: with slight differences in instruction set
18:36 mwk: the "interpolate input" instruction only works in pixel shaders, for instance
18:38 mwk: imirkin: tbh I like fadd/dadd/iadd better than add b32/f32/f64
18:38 mwk: but there are some annoyances in there too
18:38 mwk: "fadd32i"...
18:39 imirkin: right
18:39 mwk: still, I suppose it'd be better to follow nv... I started using nv names in my Tesla tests, I'm considering stuffing them in the docs/envydis too
18:39 imirkin: well if you want to normalize it, go for it
18:39 imirkin: just... please update the library code in mesa if you do
18:39 imirkin: and the ddx
18:41 mwk: right... well, I'll see about that
18:58 drathir: imirkin: there are any other ways to debug then dmesg for where acces is rejected with mpv+vdpau?
18:58 imirkin: drathir: strace -f -e open
18:58 jeremySal: mwk: thanks for the explanation
18:59 drathir: like to catch as much as possible logs before give back pc to customer ;p
18:59 jeremySal: What are the c0[0x4260] etc addresses?
19:00 imirkin: constbuf 0, address 0x4260
19:01 jeremySal: imirkin: so like constants provided by the cpu?
19:01 jeremySal: could you find them in the demmt trace?
19:02 jeremySal: like the values they are set to?
19:23 jeremySal: lop pass_b 0x1 $r4 0x0 inv $r4
19:23 jeremySal: what does the 0x1 and 0x0 mean in this case?
20:13 imirkin: constants are more like uniforms in GL
20:14 imirkin: so c0 might correspond to the glUniform* values, or ubo 0... depends
20:14 imirkin: 0x1 is probably PT (i.e. the always-true predicate, used as a dummy destination)
20:14 imirkin: 0x0 is probably the always-zero register (RZ, aka R255). note that this logic op always passes the "second" value, so it doesn't matter what the first arg is
20:40 jeremySal: imirkin: but wouldn't that imply there are four arguments? 0x1 $r4 0x0 and (~$r4)
20:41 imirkin: no
20:41 imirkin: 0x1 and $r4 are destinations
20:41 imirkin: 0x0 and inv $r4 are the sources
20:42 jeremySal: oh huh
20:42 jeremySal: so it's like a simd operation?
20:42 imirkin: no
20:42 imirkin: it's a 2-source op
20:43 jeremySal: yeah that makes sense, but a 2-destination op doesns't make sense to me
20:43 imirkin: there are a bunch of logic ops... like "and", etc. this one is "pass b" aka second arg
20:43 imirkin: well
20:43 imirkin: the predicate is set based on some rules
20:43 imirkin: i'm guessing based on whether the result is 0 or not
20:44 jeremySal: I see
20:44 jeremySal: is there no "NOT" instruction?"
20:44 imirkin: there is... this is it :)
20:47 jeremySal: isetp ne u32 and $p0 0x1 $r4 0x0 0x1
20:47 jeremySal: so this would set the predicate p0
20:48 jeremySal: but not sure how it decides?
20:49 imirkin: if $r0 != 0
21:11 jeremySal: imirkin: I assume you mean if $r4 !=0?
21:11 jeremySal: also how would I decode the 0x1s?
21:13 imirkin: yes, that is what i meant.
21:13 imirkin: 0x1 is generally P7, aka always-true
21:13 imirkin: so like... isetp sets *2* predicates somehow
21:13 imirkin: i haven't really looked into whether it sets them to the same value or what
21:14 imirkin: and it's isetp *and*, so the result is actually and'd with the last arg, which is always-true
21:14 imirkin: but it doesn't have to be
21:14 jeremySal: yeah
21:19 jeremySal: how big is the thread_kill register?
21:23 imirkin: theoretically 32-bit, but it's only ever 1 or 0
21:23 imirkin: (but obviously each thread in the warp cna have a diff value)
21:24 jeremySal: yeah
21:25 jeremySal: so why would they load it, invert it, and then check if it's not equal to zero?
21:25 jeremySal: 000000d8: 01370004 f0c80000 C mov $r4 $thread_kill
21:25 jeremySal: 000000e0: fec0072f 001fb401 sched 0x72f 0xff6 0x7ed
21:25 jeremySal: 000000e8: 00070005 f0c80000 mov $r5 $laneid
21:25 jeremySal: 000000f0: 0047ff04 5c470700 lop pass_b 0x1 $r4 0x0 inv $r4
21:25 jeremySal: 000000f8: 0ff70407 5b6a0380 isetp ne u32 and $p0 0x1 $r4 0x0 0x1
21:25 imirkin: because they're adding it in at the wrong point in their optimizer flow
21:26 imirkin: and it doesn't end up going through the thing that would have figured out they don't have to do something dumb like that
21:26 imirkin: or i'm the idiot :)
21:26 jeremySal: haha
21:26 jeremySal: so wait the f0 instructions
21:26 jeremySal: *instruction
21:27 jeremySal: inv is logical not or boolean not?
21:28 jeremySal: Also "vote any $r4 0x1 $p0"
21:28 imirkin: aren't those the same?
21:28 imirkin: it's a bitwise not though
21:28 jeremySal: what does the 0x1 do there?
21:28 jeremySal: yeah I meant bitwise
21:28 jeremySal: terminology skipped my brain
21:29 imirkin: again... vote can produce a predicate or a register value
21:29 imirkin: in this case the result goes into $r4 and $p7. $p7 == 0x1
21:29 imirkin: since it's always-true
21:29 jeremySal: ok
21:29 jeremySal: i didn't know if it was comparing $p0 to 0x1
21:30 imirkin: no
21:30 imirkin: it looks at $p0 in every thread
21:30 imirkin: and if *any* of them are true, the $r4 gets set to 0xffffffff (i think)
21:30 imirkin: or maybe 1
21:30 jeremySal: ok
21:32 jeremySal: flo instruction?
21:33 imirkin: find low bit
21:35 jeremySal: what would cause thread_kill to be set? if there is no mov to that register, is there something else going on?
21:36 imirkin: if you kill the thread (aka discard in glsl)
21:36 imirkin: also if it's a helper invocation
21:36 imirkin: like if you have a 2x2 quad and one of the fragments isn't covered
21:36 imirkin: but it still has to run as a helper invocation to make derivatives & co work
21:38 jeremySal: doesn't discard occur in the shader?
21:38 jeremySal: what instruction would cause that?
21:50 imirkin: kill :)
21:51 jeremySal: so if I don't see a kill instruction in the shader code
21:51 jeremySal: that would imply it's only set based on being a helper invocation?
21:51 jeremySal: Also, I think the vote instruction sets bits according to which threads agree
21:51 imirkin: doubtful
21:52 imirkin: that might be one of the modes
21:52 jeremySal: Because it then immediately executes flo on the result
21:52 imirkin: but vote any/vote all don't i think
21:52 jeremySal: and compares it to the lane id
21:52 jeremySal: which would be basically
21:52 jeremySal: the "first" thread to vote yes
21:52 jeremySal: if I'm guessing correctly
21:53 jeremySal: 00000108: 00070004 50d9e000 vote any $r4 0x1 $p0
21:53 jeremySal: 00000110: 00470004 5c300000 flo u32 $r4 $r4
21:53 jeremySal: 00000118: 00570407 5b640380 isetp eq u32 and $p0 0x1 $r4 $r5 0x1
21:53 imirkin: perhaps.
21:55 jeremySal: ptx has the vote instruction, but I really can't find any documentation about it?
21:58 jeremySal: ah nevermind I found it
21:58 imirkin: you could be right, dunno
21:59 jeremySal: hmm so there is a "ballot mode" that does this
21:59 jeremySal: according to the ptx docs
21:59 jeremySal: but demmt marks the instruction as vote any
21:59 jeremySal: maybe it's decoding the instruction wrong?
22:00 imirkin: unlikely.
22:01 skeggsb: imirkin: i've got an initial version of maxwell tic working on gm107 btw, can render xonotic more or less, but it makes piglit very unhappy still :P
22:01 skeggsb: but - progress
22:02 imirkin: skeggsb: ok cool
22:02 skeggsb: hopefully it'll all "just work" when we can switch on gm20x :)
22:02 imirkin: skeggsb: if you want me to have a look, let me know
22:02 imirkin: skeggsb: you should be able to test on GM107 though
22:02 skeggsb: yeah, i'm using a gm107 for it
22:02 imirkin: k
22:03 imirkin: someone was having trouble with it earlier... everything was hanging
22:03 skeggsb: oh, weird. mine is seemingly functional
22:03 imirkin: yeah, i'm sure there's some variability going on :)
22:04 imirkin: if you search for your nick in the scrollback, i think i pinged you :)
22:04 imirkin: it was with PRIME
22:04 imirkin: and dri3
22:04 imirkin: dunno if any of that matters
22:04 jeremySal: "ballot" doesn't seem to show up in the envytools source except in ptxgen/ptxgen
22:32 skeggsb: well, that's a more acceptable pass rate to begin with
22:32 skeggsb:leaves piglit running in screen and goes to get beer
22:35 imirkin: that'll make any additional piglit fails disappear...