01:00mupuf: In case someone wondered, hakzsam tested ssbo/atomics on maxwell GM1xx and it worked
01:00mupuf: hakzsam: tested dEqp too?
01:00hakzsam: only piglit
01:00hakzsam: that should be enough
02:20karolherbst: hakzsam: do you plan to do something on reator today?
02:21hakzsam: karolherbst, yeah, this afternoon, but have fun I don't use it right now :-)
02:21karolherbst: hakzsam: nah, currently I don't need it too, have to craft some vbios files together which will take me some time actually
02:22hakzsam: karolherbst, okay
03:39jscinoz: Hi, so I have a GTK 960MC (Maxwell NVC0). I've been trying to do a mmiotrace to dump the firmware, but I'm having a bit of trouble. I can get the binary driver working correctly, but when I try to load the nvidia kernel module with mmiotrace active it has a kernel oops and attempting to start xorg hangs indefinitely. Given that this only happens when mmiotrace is active, I wonder if nvidia is starting to
03:39jscinoz: take countermeasures against firmware dumping?
03:42mwk: jscinoz: I'd guess it's mere accitent.... mmiotrace is not really rock-solid
03:43jscinoz: mwk: ah, any pointers on what I can try? I've tried a few different versions of the nvidia driver and all unfortunately have the same outcome
03:43karolherbst: ohh you could try my patch
03:43jscinoz: I've tried with the latest in my distro (gentoo): 361.18-r4, and also 340.96
03:44mupuf: karolherbst: I was about to say so :D
03:44karolherbst: jscinoz: https://gist.githubusercontent.com/karolherbst/903bf75486134dd9505d/raw/6f97c12da078f8ea3d9e3a62241d12a103f22fab/mmiotrace.patch
03:44karolherbst: mupuf: did you try it out yet?
03:44jscinoz: thanks karolherbst, let me see if i can apply this
03:44karolherbst: mupuf: ohh regarding the repeat issue, we could simply rollback the kernel patch for this one I guess
03:44mupuf: karolherbst: nope, yesterday evening, I finally wrote the list of changes in DRM for linux 4.4
03:44mupuf: yes, I was terribly late :s
03:45karolherbst: jscinoz: well I did this on top of linux 4.4
03:45mupuf: the repeat issue?
03:45jscinoz: alrighty, it applied cleanly
03:45karolherbst: but I didn't check what causes this
03:45jscinoz: karolherbst: yep, i'm on 4.4
03:45jscinoz: building kernel now
03:45mupuf: karolherbst: sorry, ENEEDMORECONTEXT
03:46karolherbst: mupuf: the repeate instruction is called inside kernel assembly
03:46mupuf: oh, I did not follow your discussion enough to have a clue
03:46mupuf: I understood what you explained (related to huge pages)
03:46karolherbst: yeah it is unrelated
03:47jscinoz: with this patch, when trying to mmiotrace, should i used the latest nvidia driver, or one from the 340.XX series (the wiki mentioned the latter)?
03:47karolherbst: jscinoz: doesn't matter
03:47jscinoz: karolherbst: Alright, thanks, will let you know how it goes :)
03:47karolherbst: but you may want to use the latest one generally
03:47karolherbst: normally the newer drivers should do things better (tm)
03:47jscinoz: yeah, i'd prefer to use the latest personally, just wasn't sure if that could be related to the issues when loading the driver while mmiotrace is active
03:49karolherbst: mhh no idea what started to cause the mmiotrace issue though
03:50karolherbst: mupuf: I am sure I had the issue somewhen earlier by the way, so maybe it is something volative like changes in the mtrr/pat code
03:50jscinoz: Ah, so I'm not the only one having this problem?
03:50karolherbst: I also have it
03:50karolherbst: for some reasons
03:50jscinoz: I see
03:50karolherbst: but I was able to do mmiotraces some months earlier
03:50karolherbst: and before that I also had the issue at some point
03:50karolherbst: so no clue why
03:51mupuf: stars not aligning :)
03:51karolherbst: I just figured out what was wrong with the mmiotrace code
03:51mupuf: or, in this case, memory pointers
03:51karolherbst: no, don't think so
03:51karolherbst: the pointers are the same usually
03:51karolherbst: I could imagine that linux sometimes changes how it does ioremaps
03:51jscinoz: Awesome :D I'd love to get into kernel dev someday, but I don't even know C thus far; i'm a java/nodejs and most recently rust guy
03:52mupuf: I thought it was an alignment issue?
03:52karolherbst: yeah, but you know, my working traces had the same ioremap range sizes
03:52mupuf: hehe, can't talk right now though
03:53karolherbst: jscinoz: while mmiotracing it might take a while until X shows up, but it shouldn't take more than 2 minutes or something
03:54karolherbst: and shouldn't hang
03:54jscinoz: karolherbst: Noted, going to reboot now to try it out, will report backin 10mins or so
04:06jscinoz: Same result unfortunately. It seems the system doesn't hang in that i can still ssh in, but i can no longer switch to a VT or do anything else on the machine diretly
04:06jscinoz: dmesg when trying to mmiotrace, with that patch: https://gist.github.com/anonymous/94352a6a4ec21670765d
04:06jscinoz: same thing happened without the patch, however
04:15pmoreau: jscinoz: FYI, Maxwell is GM10x and GM20x, but NVC0 (aka. GF1xx) is Fermi, so Maxwell NVC0 should not exist
04:16pmoreau: And I doubt the 340.xx series has support for Maxwell cards, even more 2nd Maxwell, but I could be wrong
04:22jscinoz: pmoreau: Huh, i think the wiki is wrong in that case
04:23jscinoz: wait sorry, its not nvc0
04:23jscinoz: but it is maxwell
04:24karolherbst: that is a different issue
04:24jscinoz: https://wiki.freedesktop.org/nouveau/CodeName its nv117/GM107
04:24pmoreau: Sounds better ;-)
04:24jscinoz: card is 960M
04:24jscinoz: karolherbst: ah, so the kernel patch isn't applicable?
04:24karolherbst: not for this one, well you still could get the other issue as well, but this is something odd
04:24pmoreau: I had a look at the latest release of 340.xx (340.96), and it doesn't support the 960M, so you should rather use the latest version.
04:24karolherbst: it happened to me too, but only like once
04:25jscinoz: pmoreau: Ah, noted. the dmesg log above was with the latest nvidia-drivers in gentoo which are 361.18-r4
04:25jscinoz: which is*
04:25karolherbst: well the latest is 361.28 now ;)
04:25karolherbst: but shouldn't matter for this card
04:25jscinoz: ah, will resync anyway though
04:26pmoreau: I've hear about a NULL ptr dereference with recent versions of the NVIDIA driver, but… can't remember where
04:28karolherbst: that __down function is inside linux/list.h
04:29karolherbst: ohh wait
04:29karolherbst: that doesn't make much sense
04:30karolherbst: jscinoz: wanna enable CONFIG_DEBUG_LIST in the kernel?
04:30karolherbst: maybe this will give us a more detailed description of the issue
04:34jscinoz: karolherbst: will do, one moment
04:34jscinoz: Okay, building
04:39drathir: [vo/opengl} after rendering Opengl error INVALID OPERATION
04:40drathir: any ideas about that error?
04:42drathir: with sudo looks like workin...
04:43drathir: user even added to audio and video group...
05:02drathir: try playing throught mpv
05:02drathir: but smplayer looks like workin..
05:05jscinoz: okay, rebooting to get log with the extra config option + latest nvidia blob
05:18jscinoz: karolherbst: https://gist.github.com/anonymous/05a24e915bdd353766e4 is dmesg with CONFIG_DEBUG_LIST enable
05:27jscinoz: Unfortunatey, I must turn in now as its getting rather late down here. Should be around tomorrow evening to continue looking into this issue. thanks for all your help :)
05:32drathir: looks like mplayer have problems play vdpau but no idea why with sudo works...
05:33drathir: now somethin new [vo/opengl/vdpau] Before uninitializing OpenGL interop: OpenGL error INVALID_OPERATION.
05:33mlankhorst: do you compile with --shared-glapi ?
05:35drathir: its arch mpv let me check their pkgbuild script...
05:38drathir: --enable-zsh-comp \ --enable-libmpv-shared \ --enable-cdda only see...
05:39drathir: mplayer looks like workin from user and mpv with root only...
07:05drathir: te strangest thing is that user not work sudo workin thats looks like some kind of access permissions issue...
07:06drathir: now little more info https://gist.github.com/54a56884ecf18eae39ba
07:07drathir: but honestly no idea what that mean ;/
07:07drathir: if that is mpv or nouveau issue connected...
08:14imirkin: drathir: mpv + vo=gl,hwdec=vdpau won't work on nouveau, without substantial changes in nouveau.
08:20imirkin: jscinoz: the firmware won't be in the mmiotrace, sorry
08:51karolherbst: mupuf_: that driver is just plain crazy
08:52karolherbst: I just whiped out the entire cstep table except the first entry and set it as the max for every pstate, result: the driver still reclocks just fine
08:53imirkin: probably still reads from acpi
08:53karolherbst: on reator?
08:53imirkin: oh, no.
08:54karolherbst: I guess the driver has a smart fallback table or something like that
08:54karolherbst: when something is fishy: fallback to tables generated on the fly
08:59karolherbst: well then I confirm the fan_mgmt table then
09:03karolherbst: mupuf_: yep, the fan_mgmt table is kind of doing what I thought it does: 40° 30 "duty": gets me 0x154 duty with 0x438 div: 0.314, close enough
09:04karolherbst: 80° has 45 duty: gets me 0x1e0 with 0x438 div: 0.444
09:06karolherbst: 95° has 100 duty: gets me 0x406 (increeasing) with 0x438 div: 0.953
09:07karolherbst: well the gpu is at 0x419 now at 95°
09:16karolherbst: imirkin: is it normal that the vbios I put on the card via pramin differes from the one I fetch via pramin?
09:17karolherbst: and with differs I mean the read out one shifted
09:17karolherbst: bunch of 0 bytes at the begining
09:17karolherbst: at 0x5004 the vbios begins
09:17imirkin: don't think i've seen that.
09:17karolherbst: its a gm107 one
09:18karolherbst: ohh wait
09:19karolherbst: everything up to 0x5004 is overwritten by 0 :/
09:19karolherbst: and 0x619f04 is 0x1
09:19karolherbst: just fine
09:20karolherbst: mupuf_: your maxwell also has broken fakevbios
09:20karolherbst: don't know if you knew
09:20imirkin: probably it's a secondary gpu and the first time you loaded nvidia it ran the bios tables
09:23karolherbst: nope, same after clean reboot
09:24karolherbst: I think we have to fix nvafakebios for real, because currently the vbios is written into some memory region not claimed by anybody
09:24karolherbst: with funny side effects
09:44imirkin: karolherbst: what makes you say GK110 is the cutoff? did you have Tom^ check?
09:44karolherbst: yeah I checked this on toms card
09:44karolherbst: I found this out while doing the fsrm stuff
09:44karolherbst: and on his card it was all different
09:45imirkin: ah :)
09:45karolherbst: but there is more in those
09:46karolherbst: THRESHOLD_UNK30C might be also the critical threshold, but I never dig into that deeper
14:32jscinoz: mm, it seems i now get that null dereference even without trying to mmiotraace
15:34jscinoz: Alright, finally managed to capture a mmiotrace of the binary driver :D
15:34jscinoz: Had to set up the mmiotracing in initramfs; it seemed if i did it later the nvidia module would have an OOPS when loading
15:39Javantea: jscinoz: interesting, what did the oops look like?
15:41jscinoz: Javantea: https://gist.github.com/anonymous/94352a6a4ec21670765d
15:42jscinoz: So, I used demmio with the perl script from https://wiki.freedesktop.org/nouveau/NVC0_Firmware/ this page, it complained about "I don't know which chipset variant to use!" a few times, but it did still produce a number of output files
15:42jscinoz: i renamed them to nv117_XXXXX and put them in /lib/firmware/nouveau; then modprobed nouveau
15:42jscinoz: not sure if it worked; there's no output of any kind from nouveau in dmesg
15:43imirkin_: jscinoz: if you have a GM107, the firmware in kernel 4.1+ should be fine...
15:44jscinoz: Ahaha, really?
15:44jscinoz: So i should just load it normally and 3d will work? I note the main page on the wiki says GM107 acceleration was merged in 4.1
15:44jscinoz: and i'm on 4.4.1
15:45imirkin_: work is a relative term, but yes
15:45imirkin_: unfortunately we've collectively been too lazy to add exa support for maxwell, so you're stuck with glamor
15:46imirkin_: glamor, in turn, triggers bugs in nouveau, so you get bad font rendering
15:46jscinoz: Hmm, let me see if I can get portage to actually install xf86-video-nouveau with glamor enabled, it's not setting the flag for some reason
15:50jscinoz: Ah, i guess it's the default now, that flag only existed on 1.0.11; it's not present on the later ebuilds
15:51imirkin_: glamor is gone entirely, nouveau won't load for maxwell - you end up using the modesetting driver
15:52imirkin_: because the glamor integration in nouveau was broken, and even if it were fixed, that'd just be identical to using modesetting, so... dropping it seemed like the right move
15:52jscinoz: So, if I want to actually use the nvidia gpu in this machine, I have to use the binary driver?
15:52imirkin_: (modesetting has a working glamor integration)
15:53jscinoz: oh okay, i thought glamor was something mandatory for it to work
15:53imirkin_: i meant xf86-video-nouveau won't load for maxwell
15:53imirkin_: it is.
15:53imirkin_: and it will work "fine" with xf86-video-modesetting
15:53jscinoz: but with xf86-video-modesetting (or xf86-video-intel - it's an optimus setup), would the nvidia card still be available as an offload provider in xrandr?
15:53imirkin_: should be, yea
15:54imirkin_: if you're looking to just use it for 3d, then you can just get the intel ddx going with dri3
15:54imirkin_: and not worry about all this ddx stuff
15:54imirkin_: if there are screens on it you want to access, you need to get all this going
15:54jscinoz: Thankfully nope, on this one all the displays are connected to the intel GPU
15:54jscinoz: my last machine had some on the nvidia one which was a bit of a pain
15:54imirkin_: note that there's no reclocking, so it'll probably be slower than the intel gpu
15:55jscinoz: Oh, that's a shame
15:55jscinoz: i'll give it a try anyways, for curiosity's sake
15:55jscinoz: so, just to clarify, in a prime setup, the intel driver needs to be in dri3 mode?
15:55jscinoz: it won't work if its in the default dri2 mode?
15:55imirkin_: needs? no. but i'd recommend it.
15:56imirkin_: if you're using dri2, you need to take more steps. but it should work too.
15:56jscinoz: Alright, i'll go look up how to put it in dri3 mode
15:56imirkin_: build the intel ddx with --enable-dri3
15:56imirkin_: and then stick Option "DRI" "3" into your xorg.conf
15:58jscinoz: Is there a way to check what flags the driver was built with?
15:59imirkin_: not sure
16:00jscinoz: i'll check the build logs. I think it probably has it, as the ebuild has a dep on dri3proto
16:00imirkin_: gentoo force-disables it
16:01jscinoz: Ah, i'll see if i can make it do it with extra_econf
16:02jscinoz: okay, yep it's passing it to configure now
16:07jscinoz: Alright, so I've got DRI3 enabled for the intel driver, but xrandr --listproviders only shows the intel card - there's no provider from nouveau
16:11jscinoz: ah wait i see, i don't need to use xrandr with DRI3
16:11imirkin_: that's right
16:11imirkin_: like i said, fewer steps :)
16:12drathir: imirkin_: but why the mplayer+vdpau workin? the same mpv+vdpau runnin with sudo?
16:15imirkin_: drathir: no clue
16:15imirkin_: drathir: probably mplayer + vdpau doesn't actually end up using vdpau for decoding?
16:15jscinoz: I think i might need to rebuild mesa or something. DRI_PRIME=1 glxinfo | grep vendor still shows intel
16:15imirkin_: drathir: does vdpauinfo say that hw decoding is supported?
16:15imirkin_: jscinoz: what version of mesa do you have? and was it built with dri3 support?
16:16jscinoz: imirkin_: 11.1.1 and yes
16:16imirkin_: jscinoz: LIBGL_DEBUG=verbose DRI_PRIME=1 glxinfo |& head
16:16drathir: imirkin_: but its reported as hw accelerated also is visable cpu down usage... thats a little mystery become...
16:16jscinoz: Huh, |&, never seen that token before
16:16imirkin_: same as >&
16:16imirkin_: but with pipe
16:17imirkin_: drathir: ok, dunno
16:17imirkin_: drathir: not sure i care to figure it out :)
16:17jscinoz: I see
16:17imirkin_: jscinoz: pastebin the result of that
16:17jscinoz: imirkin_: https://gist.github.com/anonymous/34a2039969a2e0e2703c
16:17drathir: imirkin_: its 8600gts yes vdpauinfo and vainfo report support of hw acceleration...
16:18imirkin_: jscinoz: hmmmmm... so you have DRI3, that's good. is nouveau loaded?
16:18imirkin_: jscinoz: ls -l /dev/dri
16:20jscinoz: imirkin_: https://gist.github.com/anonymous/85097c12094e8c5790a9
16:20imirkin_: jscinoz: nouveau didn't load properly then... pastebin dmesg?
16:21jscinoz: imirkin_: https://gist.github.com/anonymous/6345bdc6ad769d334220
16:21jscinoz: no mention of nouveau in it at all
16:21imirkin_: i don't see any mentions of nouveau in there
16:21imirkin_: perhaps you have nouveau.modeset=0 somewhere?
16:21imirkin_: blob drivers like to throw that in behind your back
16:21jscinoz: Ah one sec
16:21jscinoz: ah yep, that's in my cmdline
16:22imirkin_: you'll want to remove that :)
16:22imirkin_: modeset=0 means "don't do anything"
16:22jscinoz: will rebuild kernel and retry (EFI system so cmdline has to be hard-coded in kernel unfortunately)
16:22imirkin_: use gummiboot
16:22imirkin_: (that's what i do)
16:22jscinoz: iirc last time i tried without, it resulted in no console
16:22jscinoz: ah will try that out later on; this works for now and i don't want to break things :P
16:23imirkin_: did you have your keyboard plugged into a monitor usb, which in turn was plugged into the actual usb port?
16:23imirkin_: coz that config greatly upset gummiboot
16:23jscinoz: imirkin_: no, it's a laptop - no external peripherals
16:24jscinoz: should i build nouveau into the kernel too, or leave it as a module?
16:24imirkin_: i would recommend leaving it as a module
16:24imirkin_: in fact, you can unload it now
16:24jscinoz: alright, will do
16:24imirkin_: and reload it with modeset=1
16:24jscinoz: should it be in the initrmfs?
16:24imirkin_: which should override the kernel parameter
16:24imirkin_: dunno, depends on what you put into initramfs
16:24imirkin_: my initramfs has no kernel-dependent items
16:25imirkin_: just enough to decrypt the disk and keep going
16:25jscinoz: Ah yep, i just found that i was having issue getting a console at all to enter decrpytion passphrase
16:26jscinoz: i'll leave it out for now and see how it goes
16:26drathir: imirkin_: always nouveau shouldnt be placed into mkinitcpio modules section?
16:26imirkin_: drathir: depends how your system is set up
16:26imirkin_: drathir: if you're doing a distro-styel build, yes
16:27drathir: imirkin_: thanks good to know, didnt know that...
16:30imirkin_: drathir: i update my initramfs once every... never.
16:30imirkin_: once i made it and it worked, i think i updated it once for something dumb, and that was it
16:49jscinoz: imirkin_: That fixed it, glxinfo shows the correct vendor now when DRI_PRIME=1
16:50imirkin_: if you're interested in helping improve the driver on maxwell, let me know
16:51imirkin_: so far i've not really cared about it because GM20x is out of reach for now
16:51jscinoz: Yeah, how can I help?
16:51jscinoz: Unfortunately, as you said, performance is pretty poor and there's some hilarious font & texture issues in games
16:51imirkin_: what's your (relevant) skillset?
16:52jscinoz: not much unfortunately, i'm a dev, but not really with low level stuff. java/nodejs and a tiny bit of rust
16:52jscinoz: fairly strong on the sysadmin/debugging(ish) side of things at least
16:52imirkin_: how about opengl?
16:53jscinoz: Never worked with it from a dev perspective unfortunately; basially all my work is just backend/server stuff
16:53imirkin_: well, if you're up for a very steep learning curve, i can probably assist
16:54jscinoz: I cna give it a go at least, where should I start with things?
16:54imirkin_: well... try to figure out what's giong wrong with the applications you tried
16:54jscinoz: Ah right, let me just try something native first, because the first thing i tried was via wine
16:55jscinoz: and while it does render properly with the intel driver, probably complicating things needlessly
17:04jeremySal: imirkin: I have some dumps for the fragment_shader_interlock extension
17:12imirkin_: jeremySal: cool
17:14imirkin_: i'll check it out tonight
17:14jeremySal: what does $affinity mean in assembly?
17:15imirkin_: never heard it
17:15imirkin_: where do you see that?
17:15jeremySal: demmt put it out
17:15imirkin_: paste the whole line?
17:15jeremySal: 01c70000 f0c80000 C mov $r0 $affinity
17:15jeremySal: it's blue
17:15imirkin_: oh, it must be a special register
17:15imirkin_: which nvdisasm called "affinity"
17:15imirkin_: i haven't the faintest clue what it might be
17:15jeremySal: oh cool
17:16imirkin_: doesn't seem like it existed on earlier chips
17:17jeremySal: also $thread_kill and $laneid
17:17jeremySal: have you seen those before?
17:17imirkin_: thread_kill is 1 or 0 depending on whether that invocation is "live"
17:17imirkin_: it might not be live if (a) you did a discard or (b) it's a helper invocation
17:17imirkin_: laneid is ... the id of the lane ;)
17:18imirkin_: like if you have 32 invocations running in parallel
17:18jeremySal: what is the vote instruction?
17:18imirkin_: each one of them get a different number
17:18jeremySal: yeah that makes sense
17:18imirkin_: vote compares the value of a predicate across all lanes
17:18imirkin_: and stores the aggregate result somewhere
17:18imirkin_: so like you can do "vote all" or "vote any"
17:18imirkin_: or some third one
17:19jeremySal: I see
17:19jeremySal: the ptx documentation lists the instructions but I can't seem to find detailed information about an individual instruction
17:22imirkin_: well, PTX isn't the same thing
17:22jeremySal: yeah I know
17:22imirkin_: PTX is a fake-o ISA which kinda-sorta maps onto the real thing
17:22jeremySal: I just don't have the best assembly experience
17:22imirkin_: but it can have valuable info often
17:22jeremySal: so something like mov32i, it makes me wonder what the third argument is
17:23imirkin_: yeah, nfc
17:23imirkin_: blob always uses 0xf there
17:23imirkin_: so do we :)
17:23imirkin_: problem solved.
17:23imirkin_: arguably it's not even an extra arg
17:23imirkin_: it *could* be a predicate maybe? but that'd be odd too...
17:24jeremySal: $p0 fadd32i cc $r63 abs $r7 0xfd801fe2
17:24jeremySal: would be a predicated add
17:24imirkin_: that's a dirty lie
17:24imirkin_: in theory yes
17:24imirkin_: in practice that's a misinterpreted sched line
17:24imirkin_: remember, every 4th instruction should be a sched
17:24jeremySal: that's evil
17:24imirkin_: a little yeah :)
17:25jeremySal: what does cc mean?
17:25jeremySal: and abs in that context
17:25jeremySal: or is that just nonsense
17:25imirkin_: set carry
17:25imirkin_: (or technically, condition code)
17:25jeremySal: what about abs? absolute value?
17:26imirkin_: this isn't your average cpu's ISA :) tons of modifiers for all this junk
17:26jeremySal: so how does that interact with carry flag?
17:26imirkin_: actually, i'm not sure what "cc" does in the context of fadd
17:27jeremySal: wait, it's a float addition
17:27imirkin_: might just get set when the result is non-0
17:27imirkin_: mwk: do you know what CC does for floats?
17:27jeremySal: is abs() taken after the addition or before it?
17:27imirkin_: the abs applies to the argument
17:28jeremySal: the float or the integer?
17:28mwk: imirkin_: by CC you mean the $c register?
17:29imirkin_: mwk: the thing that does like R0.CC in nvdisasm output. i think $c yeah
17:29mwk: basically, carry and overflow flags are never set
17:29mwk: zero flag is set when result is zero or NaN, sign flag is set when result is <0 or NaN
17:29imirkin_: mwk: are those set separtely on fermi+?
17:30jscinoz: imirkin_: interestingly of the native linux applications i've tried with DRI_PRIME=1, all simply hang with a black screen
17:30imirkin_: mwk: well i know there's the funny condition registers on nv50... wasn't sure hwo it worked on nvc0+
17:30jscinoz: whereas a game (WoW) in wine worked, but with font rendering issues
17:30imirkin_: jscinoz: try glxgears?
17:31jscinoz: will do, still trying to get openarena to actually exit
17:31jscinoz: it didn't even die from a kill -9
17:31mwk: imirkin_: g80 has $c0-$c3 and that's it; gf100 has $flags, which has two things in it: 4-bit $c (working pretty much like any of the G80 $cX registers), and single-bit $p0-$p6
17:31jscinoz: oh there we go, it just took a while to do it
17:31jscinoz: imirkin_: glxgears also just gets a black window
17:32mwk: on g80, $cX were apparently often used as predicates, so on Fermi they replaced it with proper single-bit predicates, and only left the singular $c
17:32jscinoz: and seems near frozen - takes 10-30 seconds to respond to ctrl-c (or even kill -9)
17:32mwk: which is mostly used for multi-precision integer ops, but if you really want there are some opcodes that use it for comparisons, logic ops, etc.
17:33drathir: imirkin_: yea mostly touch it when first os install and in situation i need made remote unlock of hdds...
17:33imirkin_: jscinoz: weird. anything in dmesg?
17:33imirkin_: mwk: right... but you can do .CC on even fp ops
17:34imirkin_: which is weird
17:34mwk: yup, as I said - it still has all the g80 features, except nobody seems to use it
17:34drathir: jscinoz: if You like node take a look on cjdns ^^
17:35mwk: oh, and only a handful of ops can be predicated with $c, while most can be predicated with $p
17:35mwk: schizo regs.
17:35jscinoz: imirkin_: nouveau 0000:01:00.0: openarena: failed to idle channel 2 [openarena twice
17:36jscinoz: drathir: Ah yep, I used to have a node on hyperboria a while back, but I kind of forgot about it, unfortunately
17:36imirkin_: mwk: some sort of docs around that would be *great* :)
17:36jscinoz: imirkin_: same thing in dmesg for glxgears also
17:36imirkin_: jscinoz: nothing before that?
17:37jeremySal: imirkin: what does it mean if there is just an exit instruction in the middle of a block of code with no flow control before it? How is the rest of the code reached?
17:37jscinoz: imirkin_: not do do with glxgears, but from other things i ran, let me pastebin full dmesg
17:38drathir: jscinoz: xonotic as game on linux You can also check...
17:38imirkin_: jeremySal: pastebin :)
17:38mwk: imirkin_: it's on my TODO list
17:39jscinoz: imirkin_: https://gist.github.com/anonymous/d8cc824d8923d222c0e4
17:39jscinoz: drathir: yep, i'll install that and give it a try
17:39drathir: jscinoz: wow nice to hear that...
17:40imirkin_: jscinoz: hm, sad. dunno =/
17:40imirkin_: jscinoz: skeggsb might be better placed to provide help
17:43jeremySal: imirkin: http://pastebin.com/xLLasbDY
17:44jscinoz: imirkin_: Ah, fair enough, thank you for all your help :)
17:47jscinoz: It's so odd that a WINE game can run with PRIME (albeit with font issues), yet anything native fails to run at all
17:47jscinoz: oh, strange, now that's not running either... I'm going to reboot; something odd is going on here
17:48imirkin_: jeremySal: cal = call
17:49jeremySal: imirkin: but it has noarguments?
17:50jeremySal: how does it specify what it's calling?
17:50imirkin_: calls never have arguments
17:50imirkin_: oh. like "cal 0x58"
17:50imirkin_: means "jump to 0x58" :)
17:50jeremySal: yeah, but I'm not seeing an address
17:50imirkin_: 0x58 is the address.
17:50jeremySal: yeah but in the example demmt
17:51jeremySal: it does not have something like "0x58"
17:51imirkin_: 00000058: 01c70000 f0c80000 C mov $r0 $affinity
17:51imirkin_: 00000008: 04800040 e2600000 cal 0x58
17:51jeremySal: huh? i just don't see 0x58
17:51jeremySal: oh my god
17:51jeremySal: the gamma on my monitor
17:51jeremySal: or something
17:51imirkin_: uh huh
17:51jeremySal: it's /perfectly/ invisible
17:51jeremySal: the address
17:52jeremySal: it just looks like cal to me
17:52imirkin_: likely excuse
17:52jeremySal: I actually can't get it to show anything there
17:53imirkin_: well i'm just looking at your paste.
17:53jeremySal: ok I swear this is a bug
17:53jeremySal: yeah, I think it's printing in white text
17:53imirkin_: it should be bright white
17:53jeremySal: yeah but my background is bright white
17:53imirkin_: then that's a problem :)
17:54imirkin_: i don't think envydis is smart enough to take current settings into account
17:54jeremySal: haha okay
17:54drathir: jscinoz: wine-staging using?
17:54imirkin_: but who sues a white background anyways
17:54jeremySal: I'm colorblind, but not *that* colorblind
17:54jeremySal: imirkin: weirdo's mostly
17:54imirkin_: not like it's the default for xterm :)
17:55imirkin_: good thing red-blue color blindness isn't a thing
17:55imirkin_: otherwise there'd be a lot more RGBA vs BGRA bugs
17:55jeremySal: so, I guess I should ask
17:55jeremySal: are there red vs green things in the output?
17:55jeremySal: the registers took different than the constants
17:56jeremySal: I'm guessing registers=red constants green?
17:56imirkin_: probably, but the actual colors aren't important
17:56imirkin_: iirc those are all blue, but i don't remember
17:56drathir: jscinoz: 3d fun?
17:56drathir: jscinoz: tabfail...
17:56drathir: jeremySal: ^
17:57jscinoz: drathir: yeah, wine1.8 with the staging patches. going to reboot momentarily though, since now not even that is working with DRI_PRIME=1, so i suspect maybe my card needs to be reinitialized or something
17:58jeremySal: wait, what is $r0?
17:58imirkin_: jeremySal: https://github.com/envytools/envytools/blob/master/util/colors.c#L45
17:58imirkin_: $r0 is register 0
17:58imirkin_: there are 255 registers, although $r255 is reserved in that it alwasy contains the value 0
17:59jeremySal: ok because $r0 is definitely not blue to me
17:59imirkin_: it's intel-style syntax
17:59imirkin_: so mov $r0 $affinity <-> $r0 = $affinity
17:59jeremySal: yeah that's what I expected
17:59imirkin_: reg is color 31 :)
18:00jeremySal: I switched to a different theme where the address is now off-white instead of pure white :)
18:01jeremySal: ooh, and now I see that the address of like 0x58 is colored on that line as well, I was confused about that as well.
18:02jeremySal: what are the s registers?
18:03mwk: where do you see an s register?
18:03jeremySal: mov $r2 $s15
18:03jeremySal: When using the shader_interlock extension
18:03mwk: is it annoyingly bright red? seems like an unknown special register
18:04jeremySal: I'm colorblind, so annoying bright red looks the same as red/brown to me
18:04jeremySal: ohh, hmm no actually I'm pretty sure it doesn't
18:04mwk: these have individual names based on the purpose, so you should never see one unless it's an unknown
18:05mwk: yeah, it's an unknown special register
18:05mwk: if you have some idea what it is, patches welcome
18:05jeremySal: ok cool
18:12mwk: we really need to sync names between g80/gf100/gk110/gm107
18:12mwk: it's $srX on g80/gf100/gk110, but $sX on gm107
18:14mwk: and the actual sr names too, for that matter
18:15jeremySal: so in a fragment shader, if you access the register $r4 before it's written to, what value will it have?
18:16jeremySal: nevermind I realized that's not happening
18:16mwk: jeremySal: undefined
18:17mwk: whatever the previous shader left in there
18:17jeremySal: ok thanks
18:17mwk: except the registers are not actually fixed, but mapped to actual hw memory via a complex mapping
18:17mwk: so your $r4 might be some previous vertex shader's $r9
18:17jeremySal: oh that's interesting
18:17mwk: in other words - totally indeterminate
18:18mwk: yeah well
18:18mwk: each processor has a huge RAM used as backing storage for $r registers
18:19jeremySal: oh they're not actually registers?
18:19jeremySal: i mean like they're not part of the core?
18:19mwk: when it wants to run a shader block (a warp), it waits until there's enough space in that RAM, and then allocates a chunk of the right size and launches it
18:19jeremySal: How do they lay it out so ram access is fast?
18:20mwk: "actually registers" is a fuzzy concept
18:20mwk: registers are RAM
18:20jeremySal: yeah for sure
18:20mwk: and the $r backing RAM is right there inside the processor
18:20jeremySal: I guess I thought of registers as the fastest memory
18:20jeremySal: where their access is hardcoded into the instruction set
18:20mwk: crazy fast
18:20mwk: the access path is also crazy wide
18:20jeremySal: but it's shared between the cores?
18:21mwk: it's per-core
18:21jeremySal: then why do they not have fixed registers?
18:22mwk: and the bus width to the register bank is 512 bits wide... and there are 4 banks per core... and that was on Tesla, the numbers are now probably 8× bigger
18:22mwk: because they do a *lot* of stuff in parallel
18:22mwk: a single core can have thousands of threads running
18:22jeremySal: oh what?
18:23jeremySal: I guess I don't really know about the architecture
18:23jeremySal: I just assumed it was a bunch of really simple CPU-like cores
18:23jeremySal: running the same code
18:23jeremySal: but that's probably wrong
18:23mwk: okay, bear with me
18:23mwk: the GPU has between 1-16 processors
18:24mwk: known as SMs (streaming multiprocessors), or MPs, or whatever
18:24mwk: roughly corresponding to a CPU core, but only roughly
18:25mwk: each of these has p to 48 "warps" running in parallel, similiar to how hyperthreading CPUs have 2 or 4 threads running in parallel - as in, every cycle the CPU decides on which thread to run
18:26mwk: this is meant to hide the latency of instructions - when one warp is waiting for a load from memory, other warps can be busy on the processor
18:26imirkin: mwk: yeah, ben did the gm107 one to be close to nvdisasm
18:26mwk: and now the big trick - each "warp" is up to 32 threads, running in lockstep
18:26imirkin: mwk: not a huge fan of how it turned out but... wtv
18:27mwk: that works a bit like SIMD on a CPU, but only a bit
18:27jeremySal: is it more or less powerful than SIMD?
18:27mwk: because the GPU tries hard to give you the illusion of independent threads
18:27mwk: in reality, the threads in a warp have to execute the same instructions
18:28jeremySal: so this is where predicated instructions come in?
18:28mwk: this is meant to keep the threads executing the same path
18:28mwk: if you do a conditional branch, and the threads in a warp don't agree on the direction, the warp will be split - some threads will become inactive, and will be resumed later once the first group finishes
18:29mwk: which is why divergent conditional branches are expensive, and there are a *lot* of tricks that are meant to rejoin threads on a single execution path after such a split
18:29mwk: so - you can have thousands of threads executing an an SM
18:30jeremySal: I see, so it's not that every single instruction is predicated, you can just put the core in an "inactive" state
18:30mwk: that means, if you want 8 GPRs for your program, you really have to have, say, 8k registers on the processor
18:30jeremySal: what is an SM?
18:31mwk: 03:24:02 mwk: known as SMs (streaming multiprocessors), or MPs, or whatever
18:31mwk: but 8 GPRs is ridiculously few, some programs may require hundreds of GPRs
18:31mwk: but 100 GPRs * 1000 threads is way too many to fit on an SM
18:32mwk: so - they made GPR count a regulable parameter
18:32jeremySal: what is a GPR?
18:32mwk: if you choose 8 GPRs, you can run lots of threads in parallel, since they'll all fit in the 64k registers they actually have on the chip
18:32mwk: general purpose register
18:33mwk: but if you choose 128 GPRs, the GPU will limit simultanously running threads (warps really), because otherwise it's run out of registers
18:33mwk: this is why there has to be dynamic allocation of the GPRs in a huge RAM
18:34mwk: oh, and for 3d work, each type of shader can have a different amount of needed GPRs, complicating the equation
18:34mwk: vertex shader needs 8 GPRs, pixel needs 4, geometry needs 11
18:35jeremySal: so all the types of opengl shaders are the same from the hardware point of view?
18:35jeremySal: there aren't special processors for the different types?
18:35mwk: of course not
18:35mwk: the hardware does differentiate between them
18:35mwk: but they do execute on the same processors
18:36mwk: with slight differences in instruction set
18:36mwk: the "interpolate input" instruction only works in pixel shaders, for instance
18:38mwk: imirkin: tbh I like fadd/dadd/iadd better than add b32/f32/f64
18:38mwk: but there are some annoyances in there too
18:39mwk: still, I suppose it'd be better to follow nv... I started using nv names in my Tesla tests, I'm considering stuffing them in the docs/envydis too
18:39imirkin: well if you want to normalize it, go for it
18:39imirkin: just... please update the library code in mesa if you do
18:39imirkin: and the ddx
18:41mwk: right... well, I'll see about that
18:58drathir: imirkin: there are any other ways to debug then dmesg for where acces is rejected with mpv+vdpau?
18:58imirkin: drathir: strace -f -e open
18:58jeremySal: mwk: thanks for the explanation
18:59drathir: like to catch as much as possible logs before give back pc to customer ;p
18:59jeremySal: What are the c0[0x4260] etc addresses?
19:00imirkin: constbuf 0, address 0x4260
19:01jeremySal: imirkin: so like constants provided by the cpu?
19:01jeremySal: could you find them in the demmt trace?
19:02jeremySal: like the values they are set to?
19:23jeremySal: lop pass_b 0x1 $r4 0x0 inv $r4
19:23jeremySal: what does the 0x1 and 0x0 mean in this case?
20:13imirkin: constants are more like uniforms in GL
20:14imirkin: so c0 might correspond to the glUniform* values, or ubo 0... depends
20:14imirkin: 0x1 is probably PT (i.e. the always-true predicate, used as a dummy destination)
20:14imirkin: 0x0 is probably the always-zero register (RZ, aka R255). note that this logic op always passes the "second" value, so it doesn't matter what the first arg is
20:40jeremySal: imirkin: but wouldn't that imply there are four arguments? 0x1 $r4 0x0 and (~$r4)
20:41imirkin: 0x1 and $r4 are destinations
20:41imirkin: 0x0 and inv $r4 are the sources
20:42jeremySal: oh huh
20:42jeremySal: so it's like a simd operation?
20:42imirkin: it's a 2-source op
20:43jeremySal: yeah that makes sense, but a 2-destination op doesns't make sense to me
20:43imirkin: there are a bunch of logic ops... like "and", etc. this one is "pass b" aka second arg
20:43imirkin: the predicate is set based on some rules
20:43imirkin: i'm guessing based on whether the result is 0 or not
20:44jeremySal: I see
20:44jeremySal: is there no "NOT" instruction?"
20:44imirkin: there is... this is it :)
20:47jeremySal: isetp ne u32 and $p0 0x1 $r4 0x0 0x1
20:47jeremySal: so this would set the predicate p0
20:48jeremySal: but not sure how it decides?
20:49imirkin: if $r0 != 0
21:11jeremySal: imirkin: I assume you mean if $r4 !=0?
21:11jeremySal: also how would I decode the 0x1s?
21:13imirkin: yes, that is what i meant.
21:13imirkin: 0x1 is generally P7, aka always-true
21:13imirkin: so like... isetp sets *2* predicates somehow
21:13imirkin: i haven't really looked into whether it sets them to the same value or what
21:14imirkin: and it's isetp *and*, so the result is actually and'd with the last arg, which is always-true
21:14imirkin: but it doesn't have to be
21:19jeremySal: how big is the thread_kill register?
21:23imirkin: theoretically 32-bit, but it's only ever 1 or 0
21:23imirkin: (but obviously each thread in the warp cna have a diff value)
21:25jeremySal: so why would they load it, invert it, and then check if it's not equal to zero?
21:25jeremySal: 000000d8: 01370004 f0c80000 C mov $r4 $thread_kill
21:25jeremySal: 000000e0: fec0072f 001fb401 sched 0x72f 0xff6 0x7ed
21:25jeremySal: 000000e8: 00070005 f0c80000 mov $r5 $laneid
21:25jeremySal: 000000f0: 0047ff04 5c470700 lop pass_b 0x1 $r4 0x0 inv $r4
21:25jeremySal: 000000f8: 0ff70407 5b6a0380 isetp ne u32 and $p0 0x1 $r4 0x0 0x1
21:25imirkin: because they're adding it in at the wrong point in their optimizer flow
21:26imirkin: and it doesn't end up going through the thing that would have figured out they don't have to do something dumb like that
21:26imirkin: or i'm the idiot :)
21:26jeremySal: so wait the f0 instructions
21:27jeremySal: inv is logical not or boolean not?
21:28jeremySal: Also "vote any $r4 0x1 $p0"
21:28imirkin: aren't those the same?
21:28imirkin: it's a bitwise not though
21:28jeremySal: what does the 0x1 do there?
21:28jeremySal: yeah I meant bitwise
21:28jeremySal: terminology skipped my brain
21:29imirkin: again... vote can produce a predicate or a register value
21:29imirkin: in this case the result goes into $r4 and $p7. $p7 == 0x1
21:29imirkin: since it's always-true
21:29jeremySal: i didn't know if it was comparing $p0 to 0x1
21:30imirkin: it looks at $p0 in every thread
21:30imirkin: and if *any* of them are true, the $r4 gets set to 0xffffffff (i think)
21:30imirkin: or maybe 1
21:32jeremySal: flo instruction?
21:33imirkin: find low bit
21:35jeremySal: what would cause thread_kill to be set? if there is no mov to that register, is there something else going on?
21:36imirkin: if you kill the thread (aka discard in glsl)
21:36imirkin: also if it's a helper invocation
21:36imirkin: like if you have a 2x2 quad and one of the fragments isn't covered
21:36imirkin: but it still has to run as a helper invocation to make derivatives & co work
21:38jeremySal: doesn't discard occur in the shader?
21:38jeremySal: what instruction would cause that?
21:50imirkin: kill :)
21:51jeremySal: so if I don't see a kill instruction in the shader code
21:51jeremySal: that would imply it's only set based on being a helper invocation?
21:51jeremySal: Also, I think the vote instruction sets bits according to which threads agree
21:52imirkin: that might be one of the modes
21:52jeremySal: Because it then immediately executes flo on the result
21:52imirkin: but vote any/vote all don't i think
21:52jeremySal: and compares it to the lane id
21:52jeremySal: which would be basically
21:52jeremySal: the "first" thread to vote yes
21:52jeremySal: if I'm guessing correctly
21:53jeremySal: 00000108: 00070004 50d9e000 vote any $r4 0x1 $p0
21:53jeremySal: 00000110: 00470004 5c300000 flo u32 $r4 $r4
21:53jeremySal: 00000118: 00570407 5b640380 isetp eq u32 and $p0 0x1 $r4 $r5 0x1
21:55jeremySal: ptx has the vote instruction, but I really can't find any documentation about it?
21:58jeremySal: ah nevermind I found it
21:58imirkin: you could be right, dunno
21:59jeremySal: hmm so there is a "ballot mode" that does this
21:59jeremySal: according to the ptx docs
21:59jeremySal: but demmt marks the instruction as vote any
21:59jeremySal: maybe it's decoding the instruction wrong?
22:01skeggsb: imirkin: i've got an initial version of maxwell tic working on gm107 btw, can render xonotic more or less, but it makes piglit very unhappy still :P
22:01skeggsb: but - progress
22:02imirkin: skeggsb: ok cool
22:02skeggsb: hopefully it'll all "just work" when we can switch on gm20x :)
22:02imirkin: skeggsb: if you want me to have a look, let me know
22:02imirkin: skeggsb: you should be able to test on GM107 though
22:02skeggsb: yeah, i'm using a gm107 for it
22:03imirkin: someone was having trouble with it earlier... everything was hanging
22:03skeggsb: oh, weird. mine is seemingly functional
22:03imirkin: yeah, i'm sure there's some variability going on :)
22:04imirkin: if you search for your nick in the scrollback, i think i pinged you :)
22:04imirkin: it was with PRIME
22:04imirkin: and dri3
22:04imirkin: dunno if any of that matters
22:04jeremySal: "ballot" doesn't seem to show up in the envytools source except in ptxgen/ptxgen
22:32skeggsb: well, that's a more acceptable pass rate to begin with
22:32skeggsb:leaves piglit running in screen and goes to get beer
22:35imirkin: that'll make any additional piglit fails disappear...