02:29 pabs3: my GPU got into a state where plain X11 works, but anything OpenGL based does not (and compositor things do not)
02:30 pabs3: anyone have any ideas how to debug this?
02:31 pabs3: there is nothing mentioned in dmesg
02:39 gnarface: maybe you forgot to install mesa?
02:40 gnarface: pabs3: maybe you forgot to install mesa?
02:40 gnarface: even glxgears doesn't run?
02:40 pabs3: mesa is installed and OpenGL was working yesterday under the same X11 server
02:40 pabs3: no, all of these don't work: glxdemo glxgears glxheads glxinfo
02:41 pabs3: er, glxinfo does work
02:41 pabs3: the weird thing is that the initial frame gets rendered but then the apps freeze and no more rendering happens
02:41 gnarface: try these commands:
02:42 gnarface: glxinfo |grep direct\ rendering -i
02:42 gnarface: glxinfo |grep opengl\ version -i
02:42 imirkin: pabs3: pastebin xorg log
02:50 pabs3: http://paste.debian.net/1027999/ https://people.debian.org/~pabs/tmp/Xorg.0.log
02:51 imirkin: pabs3: and pastebin glxinfo?
02:53 imirkin: pabs3: oh wait, the first frame issue...
02:53 imirkin: yeah, i've seen that
02:53 pabs3: http://paste.debian.net/1028000/
02:53 imirkin: try this: vblank_mode=0 glxgears
02:54 imirkin: the issue appears to be, in part, that some events are stuck on the event queue
02:54 imirkin: although the reason they're stuck could be the underlying reason this comes up
02:54 pabs3: that works
02:55 imirkin: basically vblank events get semi-fubar'd
02:55 HdkR: I hear that vblank is overrated anyway
02:55 imirkin: not helping matters is the fact that vblank handling was legitimately broken
02:55 imirkin: in some kernels
02:55 imirkin: after you hit 32 bits worth of them
02:55 pabs3: I'm on a recent Linux kernel 4.16.12-1
02:55 imirkin: the vblank counter got expanded to 64-bit, but some things got left behind
02:56 imirkin: ah, that should be good, i think... let's check
02:56 imirkin: drm/vblank: Data type fixes for 64-bit vblank sequences.
02:56 pabs3: so the point at which this issue starts is when the vblank counter overflows?
02:56 imirkin: well, AN issue
02:57 imirkin: but e.g. right now i'm having an issue where running glxgears causes a TON of cpu usage in X
02:57 imirkin: which seems to be because some events are getting stuck
02:57 imirkin: and a linked list which is meant to be short becomes ... not short.
02:57 imirkin: how long has your box been up?
02:57 pabs3: I see that here, vblank_mode=0 glxgears gives 42% CPU usage in Xorg
02:58 imirkin: well - that makes sense
02:58 imirkin: you're getting thousands of fps
02:58 imirkin: x has to composite all that, it ain't free
02:58 imirkin: i'm seeing it at 60fps :)
02:58 pabs3: 7 days, would have been in GNOME shell most of that time. the freeze thing started yesterday
02:58 pabs3: ah :)
02:58 imirkin: 7 days isn't enough.
02:58 pabs3: 41707 frames in 5.0 seconds = 8341.360 FPS
02:59 imirkin: [for wrap-around]
02:59 imirkin: although perhaps i don't have an ideal handle on how that counter works
03:00 pabs3: can the apps being run affect the time-frame? (0ad for example)
03:00 imirkin: i'm at 37 days of uptime, both kernel and Xorg
03:00 imirkin: and 90 isn't uncommon for me
03:00 imirkin: so i run into fun issues
03:08 pabs3: imirkin: is upgrading mesa likely to fix this? I see 8.1 in experimental. or is the vblank counter in Linux?
03:08 imirkin: no
03:08 imirkin: it's a kernel change
03:09 imirkin: i'm still fetching the latest stable tree ... for some reason it's coming in at 6KB/s
03:09 pabs3: ok
03:14 gnarface: there's been something wrong with routes across the atlantic for me for the past several weeks
03:15 gnarface: i'm getting an average speed of *almost* twice that
03:16 imirkin: it's sped up for me too. 10KB/s range now :)
03:25 imirkin: pabs3: looks like the vblank fix went into v4.16.13
03:25 imirkin: so ... off-by-one :)
03:25 pabs3: aha, thanks!
03:26 imirkin: but i'm running with that fix, and i still see other issues. so ... not a panacea
03:26 pabs3: ah, there is an MR to update unstable to 4.16.13, so it should be available to me soon https://deb.li/EfKU
03:28 Subv: hey, has anyone else experienced a weird envydis issue where it'd stop decoding after a while and output "??????0d ???????? ??? [unknown: ??????0d ????????] [incomplete] [unknown instruction]" ?
03:29 imirkin: happens when you feed bs into it
03:29 imirkin: make the file available, i can have a look
03:30 imirkin: recent versions have gotten picky about sched's on maxwell - every 4th op is decoded as "sched" data
03:34 HdkR: So picky :P
03:36 Subv: imirkin: https://gist.github.com/Subv/e20d08ce56dbad650313a7ad7696655f this is the file
03:36 Subv: it stops at offset 00000268 for some reason
03:36 Subv: but that's clearly an f2i
03:37 imirkin: dunno. decodes fine for me.
03:37 Subv: what
03:37 imirkin: https://hastebin.com/jipekuvuku.bash
03:38 Subv: thanks
03:38 Subv: i guess my envydis version is just broken
03:38 imirkin: didn't you say you were trying it on msys or something?
03:39 Subv: yeah i'm on windows
03:39 Subv: i guess that's not actually supported and may have weird bugs laying around
03:39 imirkin: well - at the very least - untested
03:39 imirkin: can't easily imagine what *platform* issue would cause this...
03:43 HdkR: imirkin: What is the C and grouping in your disassembly?
03:43 imirkin: branch destination
03:43 imirkin: branch targets get an extra newline before them
03:43 HdkR: Ah, for ones that aren't indirect
03:43 imirkin: (C = call? something like that)
03:44 imirkin: well, indirect branches aren't TOO frequent
03:44 HdkR: Until you hit subroutines or cuda I guess
03:44 imirkin: have you ever seen subroutines get used in GL?
03:44 HdkR: Yes actually
03:44 imirkin: did you write the code? :p
03:44 HdkR: and it wasn't even me using it
03:44 imirkin: hehe
03:44 HdkR: lol
03:44 HdkR: I knew it
03:44 HdkR: :P
03:45 imirkin: and was it a good idea?
03:45 imirkin: perf-wise
03:45 HdkR: For the one I wrote, no. For the one that I saw, yes
03:45 imirkin: the RA implications are pretty painful
03:46 HdkR: The case I saw, each subroutine type uses around the same register amounts
03:46 HdkR: So it worked
03:47 HdkR: er, each of the subroutines per type
03:47 imirkin: and didn't have 20 diff callsites
03:48 HdkR: Was a handful per shader if I recall
03:50 imirkin: it's really multiple callsites which causes pain
03:51 imirkin: since they all have to be compatible
03:51 HdkR: Right
03:52 HdkR: It's quite niche where subroutines would actually end up being better
04:01 Subv:prays no games actually use that
04:03 HdkR: hehe
04:03 HdkR: bwehehehehe
04:03 HdkR: hahahahaha
04:04 imirkin: most i've seen is a game that used the subroutine type, but didn't actually define any implementations or make any calls to it
04:04 imirkin: so you had to support it in the glsl
04:04 imirkin: but nothing more
04:07 Subv: this is kind of offtopic, but i wonder, is there any particular reason (in the hardware) why the OpenGL spec doesn't allow you to use the std430 layout modifier in UBOs?
04:07 HdkR: imirkin: I wouldn't call any exclusive titles on the console especially normal. :P
04:08 HdkR: Subv: Limitations on some hardware that can't support it
04:08 imirkin: dunno - can't really think what those might be...
04:08 HdkR: and I don't think anyone cared to make an extension after the fact
04:09 imirkin: std430 came in with ssbo's... what hw would have trouble with those packing rules on ubo's?
04:10 HdkR:forgets which
04:14 HdkR: I guess you could probably find out if you do some weird packing, enabled the packed qualifier and query the locations
04:14 HdkR: Would just need to run it through a ton of different hardware
04:19 HdkR: Subv: Dang it, that is an address calculation happening in there :P
04:20 Subv: indeed, i guess that's what the iscadd was for, heh, just didn't see it because i hadn't actually disassembled the whole shader
04:20 Subv: also, 64-bit loads
04:21 HdkR: woop woop, 64bit loads
04:21 HdkR: Call the awesome police
04:22 HdkR: Wait until you find the 96bit and 128bit loads. Blow your mind
04:23 Subv: all these fancy instructions and i can't even get nvcc to generate a psetp
04:28 HdkR: Get imirkin to support cuda and nouveauCC and set up a dag pattern to match some wacky sequence to generate it :D
07:09 mwk:wonders how nvidia came up with the idea that 0x1d4 entries is exactly the right size for vertex program constant memory
07:10 karolherbst: well "468"
07:11 karolherbst: maybe 0x2c stuff is used for internal stuff though?
07:11 mwk: nope
07:12 mwk: this is the total RAM size
07:13 mwk: unless.... hmmm
07:23 mwk: ugh fuck.
07:26 karolherbst: mwk: so what did you figure out?
07:26 mwk: karolherbst: 0x1d4 is actually ugly because it *includes* internal stuff
07:26 karolherbst: :)
07:26 mwk: except I can't quite figure out the layout
07:27 mwk: also, found a bug in nv30 gallium driver
07:27 mwk: probably nv40 as well
07:30 mwk: imirkin: so, here goes something bugreport-ish
07:30 mwk: the vertex program const space contains more than just user consts
07:31 mwk: on Kelvin, 0x00:0x60 are params for fixed-function T&L, 0x60:0xc0 are user-defined; Rankine has 0x00:0x9c fixed, 0x9c:0x19c user-defined; and I can't quite figure out the boundaries on Curie
07:32 mwk: the distinction is sort-of enforced in hardware: if you do an indexed const access on Kelvin/Rankine, by default it'll return 0 if the address is in the "fixed-function" range
07:33 mwk: though if you do an absolute const access, Rankine allows it
07:33 mwk: the problem is, nv30 gallium driver doesn't care about it, and stores its consts in 0:0x100 instead of 0x9c:0x19c
07:34 mwk: it doesn't support indexed accesses AFAICT, so the problem is mostly unnoticable
07:34 mwk: but there's one unobvious piece of fixed-function transform used: the viewport scale & offset
07:35 mwk: these are implicitely performed by the same processor that executes vertex programs and the parameters are stored in the fixed-function area of const space
07:35 mwk: specifically, at slots 0x76 (viewport scale) and 0x77 (viewport translate)
07:36 mwk: so if vertex programs use enough consts, the nv30 gallium driver will overwrite viewport transform params with user shader params, or the other way around
07:37 mwk: on Curie, something similar should is probably happening, but I haven't figured out the exact offsets yet
07:38 skeggsb_: i wonder why they even made those accessible from the class if that's the case
07:38 mwk: proper fix: don't use addresses 0:0x9c on Rankine
07:38 mwk: or, if you really want to, enable the bit that allows indirect access to that range, and just avoid allocating 0x76 & 0x77
07:39 mwk: skeggsb_: on Kelvin, they sort of didn't
07:39 mwk: but... here's the thing
07:40 mwk: Kelvin does *not* do implicit viewport transform if vertex program is enabled
07:40 mwk: so the driver implicitely has to append two instructions to every shader, which read from the fixed-function const mem slots
07:41 mwk: so the fixed-function memory has to be visible through the class, because you need to access it through the shader tail
07:42 mwk: also, exposing it in instruction encoding has a benefit: you can have negative offsets encoded in instructions
07:42 mwk: on kelvin, if you write c[A0.x-0x20], the driver actually encodes base address of 0x40 in the instruction
07:42 mwk: which is user_base-0x20
07:43 mwk: and since the fixed function slots read as 0, this matches NV_vertex_program semantics for out of bound accesses
07:45 mwk: skeggsb_: I suppose it also has the benefit of making other fixed-function state directly accessible to vertex programs if they need to
07:45 skeggsb_: makes sense in a subtle way :P
07:46 mwk: also, the weird Kelvin behavior is enshrined in NV_vertex_program1_1
07:47 mwk: the exact behavior of Kelvin is: if the magic "access control" bit is set to "all addresses", both absolute and indexed accesses can see the whole memory
07:47 mwk: but if it's set to "user only", indexed accesses can only see 0x60:0xc0, and absolute can only see 0x3a [aka viewport scale], 0x3b [aka viewport offset], and 0x60:0xc0
07:48 mwk: but NV made a NV_position_invariant option for vertex programs, which is basically supposed to do bit-exact same position transform as fixed-function
07:48 mwk: which means it needs to read the FF matrices
07:48 mwk: from https://www.khronos.org/registry/OpenGL/extensions/NV/NV_vertex_program1_1.txt:
07:48 mwk: " Is relative addressing available to position-invariant version 1.1
07:48 mwk: vertex programs?
07:48 mwk: RESOLUTION: No. This reflects a hardware restriction.
07:48 mwk: "
07:48 mwk: which is why they changed how access control works on Rankine, to only apply to indexed accesses
07:49 mwk: note that NV_vertex_program2 does *not* have this restriction :D
07:50 karolherbst: mwk: is that extension even exposed on all the hardware?
07:54 mwk: karolherbst: uh, sure
07:55 mwk: NV_vertex_program1_1 is on everything from NV20 up, NV_vertex_program2 is on everything from NV30 up
07:56 karolherbst: annoying
07:56 karolherbst: maybe the hardware limitation stuff refered to NV20 though?
07:56 mwk: well, duh
07:56 mwk: obviously it did
07:56 mwk: that's why it's gone in VP 2.0
08:07 mwk: eh, screw that
08:07 mwk: made enough text already
08:09 mwk: http://envytools.readthedocs.io/en/latest/hw/graph/xf/ctx.html#xfctx
08:09 mwk: FWIW, this is what the const space looks like
14:57 feaneron: what is it missing from nouveau that is required for a vulkan driver?
15:02 imirkin_: feaneron: userspace va management
15:02 feaneron: userspace == mesa level?
15:02 imirkin_: well, non-kernel
15:03 imirkin_: that's what's missing from the kernel.
15:03 imirkin_: the ability to enable userspace to do that
15:03 imirkin_: on the userspace end, what's missing is a vulkan driver :)
15:03 feaneron: oh, it needs kernel work + a vulkan driver proper
15:04 feaneron: what does va stands for?
15:04 imirkin_: virtual address
15:06 imirkin_: vulkan can allocate memory, and then "place" a resource into that memory
15:06 imirkin_: however different resources need to have different PTE bits set, e.g. textures vs buffers vs etc
15:06 imirkin_: and there's no way for userspace to manage it like that right now
15:09 feaneron: pte?
15:09 pendingchaos: page table entry
15:10 feaneron: thanks
15:11 feaneron:needs to learn about all that
15:12 imirkin_: it'll be a steep learning curve...
15:12 imirkin_: if you've never heard of "PTE" and "VA" before, i strongly recommend first trying to understand how plain CPUs work
15:14 feaneron: i love graphics stuff, from window management to compositing to rendering to drivers, but it's been an awful lot of things to learn so far :)
15:15 feaneron: every time i ask something around, i end up with more stuff to look for
15:15 imirkin_: "virtual memory" has been around since 386's in the PC world (probably earlier on other chips, i'm not sure tbh...)
15:16 imirkin_: so not exactly a new concept
15:16 feaneron: perhaps a better approach would have been picking a super small stupid issue to work on, and make it happen
15:16 imirkin_: but a very important one
15:17 imirkin_: what GPU do you have?
15:17 feaneron: a kepler one
15:17 feaneron: nv106
15:18 feaneron: it's well supported by nouveau so far
15:18 imirkin_: yeah, kepler's the best gen
15:19 imirkin_: as far as things actually working
15:19 feaneron: i couldn't "fix my own problems" because, well, i honestly didn't have any so far
15:19 feaneron: except performance
15:19 imirkin_: you should be able to reclock it
15:19 imirkin_: although GK208's aren't exactly beasts of burden...
15:20 feaneron: but nobody sane should put newbies to work on performance, so i'm not working on that for now
15:20 imirkin_: how far off is the perf from what you'd expect?
15:21 feaneron: it's ~1/3 from the proprietary driver
15:21 imirkin_: with reclocking?
15:22 feaneron: yes
15:22 imirkin_: that's VERY surprising
15:22 imirkin_: you should be getting in the 60-80% range
15:23 feaneron: now that's surprising too!
15:25 feaneron: well, if anyone remembers of a low hanging fruit that a total noobie could work on, i'm open to suggestions
15:26 imirkin_: what's your skillset?
15:26 feaneron: heh i don't know how to answer that :) i know a bit of vulkan, opengl, and i'm regularly contributing to mutter's wayland backend
15:27 imirkin_: so... you know C and/or C++?
15:27 feaneron: which uses cogl, which uses egl/gles3
15:27 feaneron: yup
15:27 imirkin_: that's a good start.
15:29 imirkin_: lessseee here....
15:29 imirkin_: if you wanted to do something useful, you could do https://trello.com/c/jmxlZen9/156-fermi-kepler-image3d
15:30 imirkin_: this isn't so much hard as it is a matter of figuring out wtf nvidia does, and then doing that too
15:30 imirkin_: this one would be a bunch of compiler work: https://trello.com/c/XCm53EsR/129-nvc0-re-enable-memoryopt-for-patch-variables
15:32 feaneron: about the first one, the process would be something like: install prop drivers, record what it does, replicate in nouveau?
15:33 imirkin_: there's a minor step that you missed
15:33 imirkin_: right before "replicate in nouveau"
15:33 imirkin_: which is "analyze and understand how nvidia achieves this"
15:33 feaneron: figure out?
15:33 feaneron: aha :)
15:36 feaneron: the second one, is it about mesa's src/driver/gallium/nouveau/nvc0 stuff?
15:37 imirkin_: it's about codegen
15:38 feaneron: ok, thanks for all the support. i'll dive into these ones
15:39 imirkin_: they're not really newbie-friendly tasks
15:39 imirkin_: but it's difficult to maintain a good set of newbie tasks
15:39 imirkin_: since they tend to get done.
15:40 feaneron: of course
23:11 BootI386: imirkin: Why is it disabled? https://trello.com/c/XCm53EsR/129-nvc0-re-enable-memoryopt-for-patch-variables
23:14 imirkin_: coz it doesn't differentiate patch vs non-patch
23:19 BootI386: Ok, thx
23:27 BootI386: Ugh
23:43 Subv: huh, in envydis, does "struct rbitfield s2020_bf = { { 20, 19, 56, 1 }, RBF_SIGNED };" refer to a 2's complement signed integer formed by concatenating [20:38,56], or does it refer to an unsigned integer in [20:38] with bit 56 saying whether to negate it or not?
23:45 imirkin_: more like take bits 38:20 as the low bits, and use bit 56 to fill the remaining bits.
23:46 imirkin_: i.e. either all 1's or all 0's
23:46 imirkin_: (which is not the same thing as negating)
23:48 Subv: that would be the same as constructing a signed 20-bit 2's complement with 56:56_38:20 and sign-extending it right?