01:38 nyef: Does a struct nouveau_fence in the gallium driver have any correlation to a struct nouveau_fence in the kernel?
01:39 nyef:is seeing a great number of things called "fence", with very little indication of how (or if!) they relate to each other.
02:03 imirkin: nyef: no
02:04 imirkin: that's an improvement i desperately want --
02:04 imirkin: being able to have userspace fences which can perform non-cpu-burning waits
02:04 imirkin: syncfd is the cool new thing for that
02:04 imirkin: [afaik]
02:05 airlied: syncobj
02:32 nyef: ... dear god, the gallium driver fence mechanism is *WHAT*?!?
02:36 nyef: Is it at all critical that the commands in the DMA buffer all be executed, or is "being able to re-use the buffer" an acceptable substitute?
02:37 nyef: Ugh. And the userland has to support "old" kernels "for a while", doesn't it?
02:37 nyef: Just like how the rule for the kernel is "thou shalt not break the userland".
03:14 nyef: ... This whole pushbuf / kick_notify thing is for this horrid, horrid busy-wait fence thing, isn't it?
03:31 nyef: And once that's sorted, then it more-or-less becomes a game of differential state tracking and some small locking bits.
09:33 crmlt: karolherbst: thanks for patch will give it try asap
09:44 nyef: Ahh... a laptop could easily have different stolen-memory requirements than a desktop on the same chipset. Okay, that's plausible.
09:45 nyef: Guess I'll need to do a new kernel for my MCP89 as well.
09:48 nyef: Hrm. Are the kernel fences polled, or interrupt-based? And if they're polled, why?
09:49 nyef: ... Backwards compatibility might be a good why, actually.
09:49 nyef: Well, a *believable* why, at least.
10:26 nyef: ... In xf86-video-nouveau, nv50_accel.c, NVAccelInitNV50TCL(), at line 182 (the "return FALSE;" just after a couple of pushbuf operations/checks, shouldn't there be a small run of nouveau_object_del() calls? If not, why not?
10:26 nyef: Ugh. And I unbalanced my parens).
11:02 pendingchaos: karolherbst: figured out how to fix the max-size's image2DMS subtest
11:02 pendingchaos: using TWO_D_NO_MIPMAP instead of TWO_D in the TIC fixes it
11:04 pendingchaos: (it doesn't work for image2DMSArray though)
11:35 crmlt: karolherbst: thanks for patch will give it try asap
12:21 kubast2_: 12/16 threaded timeout fuck
13:09 kubast2_: I guess I only need to self compile xf86-video-nouveau package iirc rn
14:00 kubast2_: So I randomlly get a pass
14:00 kubast2_: On check()
14:05 nyef: ... hey, me? Remember which box I was setting up for full-stack nouveau development all those months ago? The MCP89, with the crash-happiness and all the rendering bugs? Why the HELL haven't I re-updated everything and finished setting it up yet?
14:08 karolherbst: pendingchaos: sad, maybe imirkin has an idea?
14:10 pendingchaos: the blob seems to use TWO_D_NO_MIPMAP for image2DArray too
14:13 karolherbst: pendingchaos: and they also just bump up the width/height, right?
14:14 pendingchaos: by ms_x/ms_y? yes
14:18 karolherbst: and they use TWO_D_NO_MIPMAP for 2dmsarray? Maybe there are more values and we just don't parse them or something
14:24 karolherbst: pendingchaos: do you know if the x = 0 row verifies correctly? or does piglit check 0,0,0,0 and then 1,0,0,0?
14:26 nyef: So, MCP89 DPort HPD tests... a new miserable experience.
14:26 nyef: The /good/ thing is that I haven't managed to lock up the computer yet.
14:26 pendingchaos: for both the image2DMS and image2DMSArray tests, the 1,0,0,0 pixel is the first one found to be incorrect
14:26 pendingchaos: so apparently 0,0,0,0 is fine
14:26 nyef: Less good? "link rate unsupported by sink"
14:27 pendingchaos: (that's without any changes such as using TWO_D_NO_MIPMAP)
14:30 pendingchaos: nothing in the TIC looks unusual other than the usage of TWO_D_NO_MIPMAP
14:33 karolherbst: pendingchaos: I think the actual issue is something besides that. 8x16384x8x1 fails, where test/8x8x16384x1 passes. That is odd enough on its own
14:34 pendingchaos: with 8x MSAA, 16384x8 would result in a 65536x16 texture, 8x16384 would result in a 32x32768 texture
14:34 pendingchaos: I think something is broken when a texture dimension is > 32768 or something
14:35 pendingchaos: and using TWO_D_NO_MIPMAP with the texture fixes the issue in the non-array case
14:36 karolherbst: pendingchaos: it can be something dump like glGetTexImage returning wrong data
14:36 kubast2_: nyef yeah I think I will settle it without linuxdrmnext for now lel
14:36 karolherbst: pendingchaos: ohh right, the msaa layour
14:37 pendingchaos: you can't use glGetTexImage with multisampled images
14:37 pendingchaos: the 16384x8/65536x16 image is only accessed through image load/store
14:38 nyef: I now have three test series blocked waiting on the same damned machine, which is busy updating things. /-:
14:38 karolherbst: ahh right
14:38 kubast2_: I can't compilr libdrm-git nor lib32-drm/libdrm-git and the one in the non aur repos don't play nicelly with linuxdrmnext and xf86-video-nouveau-git
14:39 kubast2_: *lib32-mesa
14:39 karolherbst: pendingchaos: actually, the "(1, 0, 0, 0)" is missleading. this is a check at x=0, y=0
14:40 karolherbst: the data should be something like: (x*4 + (y*width+4)*4, r+1, r+2, r+3)
14:40 karolherbst: so (0, 1, 2, 3) at (0,0), (4, 5, 6, 7) at (1, 0).... (4096, 4097, 4098, 4099) at (0, 1)
14:41 karolherbst: where they draw into a 1024x1024 2d image
14:42 pendingchaos: looking at the code that prints "probe value at ..."
14:42 pendingchaos: it seems the first component is the sample index
14:42 karolherbst: pendingchaos: in the 8x16384x8x1 case it seems that nothing gets drawn at all
14:42 pendingchaos: the second is the x
14:42 karolherbst: so the 2d texture read from is filled with (0,0,0,0)
14:42 pendingchaos: and the third is probably the y
14:42 karolherbst: yeah
14:43 karolherbst: and third z/idx
14:44 karolherbst: anyway 16384*8*8 == 1024 * 1024, so I guess they just copy the pixels out and simply have a different layout of the data
14:44 karolherbst: sooo
14:44 karolherbst: I doubt that writing into the 1024x1024 texture fails
14:44 karolherbst: so suldb doesnt read correctly from a 65536x16 texture
14:44 karolherbst: or well, nothing at all
14:45 pendingchaos: or perhaps sust doesn't write correctly
14:45 pendingchaos: (or both)
14:45 pendingchaos: unless the coordinate is 0,0?
14:45 karolherbst: no, sust writes correctly
14:45 karolherbst: why wouldn't it?
14:46 karolherbst: it is a 1024x1024 2d texture
14:46 karolherbst: if that fails, a lot of other thigns would fail as well
14:46 pendingchaos: perhaps I'm misunderstanding what the test is doing
14:46 karolherbst: also, it seems like the 2d ms image contains the correct data
14:46 karolherbst: at least according to qapitrace
14:47 karolherbst: so the texture itself should also be okay
14:48 pendingchaos: the source or destination image?
14:49 karolherbst: source
14:49 karolherbst: okay mhh
14:50 pendingchaos: what about the image written in the shader on line 100 in max-size.c?
14:50 pendingchaos: (the destination image)
14:52 karolherbst: okay, first draw: copy from 2D into 2D_MULTISAMPLE
14:53 karolherbst: dest texture contains correct data
14:53 karolherbst: or well, data at all, which is already a good sign
14:53 karolherbst: ohhhhh, wait
14:58 karolherbst: I don't really know what the second draw does
14:58 karolherbst: but the third one does something on src and dst being that 2dms image
15:01 nyef: Okay, same thing with needing mako to compile mesa on this thing...
15:01 karolherbst: pendingchaos: anyway, draws 1-3 behave the same in 8x8x16384x1 and 8x16384x8x1
15:02 nyef: ... but no complaints from portage when I try to install it. Yay!
15:03 kubast2_: A crash when moving a firefox window xfce4 arch linux 4.17.3 kernel 18.1.3 mesa http://termbin.com/zo4d
15:04 karolherbst: pendingchaos: and in the last draw the image data gets overwritten with (0,0,0,0), so I guess the store does something, but the load returns 0
15:05 kubast2_: Disabled modeset
15:05 pendingchaos: but doesn't the third draw also load from a 8x16384x8x1/8x8x16384x1 MS image?
15:06 kubast2_: And xorg doesn't run at all that was a bad idea
15:07 karolherbst: pendingchaos: fun
15:07 karolherbst: pendingchaos: capping width/height to 0x7fff does _something_
15:08 pendingchaos: I wonder what would the fourth/last draw be doing differently from the third that makes it fail
15:08 pendingchaos: capping the width/height in the TIC?
15:08 karolherbst: pendingchaos: yes
15:08 karolherbst: so half the output is correct
15:08 karolherbst: going above 0x7fff and everything is 0 again
15:09 karolherbst: ohh wait
15:09 karolherbst: actually 0x8000 is fine
15:09 karolherbst: but 0x8001 and things are 0
15:23 karolherbst: pendingchaos: well, I go back to figuring out the last CTS fail. I don't think that such big textures will be used, or at least not on nouveau anytime soon, so I guess we are fine
15:23 karolherbst: would be nice to still fix it though
15:24 pendingchaos: you sure the destination image for the third draw is correct?
15:24 pendingchaos:nods
15:25 pendingchaos: TWO_D_NO_MIPMAP could probably end up being used for the image2DMS case, assuming it causes no regressions
15:38 kubast2_: Yeah I will try to get a more meanigfull report some time later on can't do it rn
15:43 nyef: What the...? Is mesa set to compile assertions out or something?
15:45 pendingchaos: I think that's normal for release builds of software
15:45 pendingchaos: to improve performance
15:45 nyef: You misspelled "decrease reliability and debuggability".
15:52 nyef:really doesn't understand this "performance" rationale for playing field support in hard mode.
15:55 nyef: Okay, next hop backwards, looks like fb->Attachment[BUFFER_DEPTH].RenderBuffer isn't set up properly.
16:13 nyef: ... Probably either: Something isn't checking the return value from AllocStorage or something isn't calling AllocStorage in the first place.
16:16 karolherbst: nyef: you put asserts in places where you doesn't really expect the result, but what do you want to do? crash the application for users?
16:17 karolherbst: you should never hit an assertion in the first place
16:17 karolherbst: assert is not error handling
16:18 nyef: If you hit an assert, the application is *already* going to crash.
16:18 karolherbst: nyef: also, this is OpenGL, CPU overhead matters
16:18 karolherbst: nyef: not necessarily
16:18 karolherbst: you want to put error handler _after_ assert
16:19 nyef: Okay, clearly "trace" in gdb doesn't do what I expected.
16:19 karolherbst: well, gdb should just stop at hittin assert anyway
16:19 karolherbst: and then you can go back the stack
16:20 nyef: And the very idea that I've been spoiled by using "good" debugging tools instead of gdb is horrifying, so I must be doing something wrong.
16:20 karolherbst: trace is for collecting data
16:20 karolherbst: if you want to set a breakpoint, you use break ;)
16:21 nyef: I have a set of functions. I want to know when the program enters these functions, what the arguments are, when the program leaves the functions, and what the results are.
16:21 nyef: I don't want the program to STOP while doing so, however.
16:21 nyef: This, to me, says "trace".
16:21 karolherbst: okay, then you can use trace, right
16:22 karolherbst: but I think for tracepoints you have to do a remote debugging session or something
16:22 nyef: Dear god. The absolutely horrid debugging tools that I've been using for the past decade HAVE spoiled me for gdb.
16:23 karolherbst: which kind of makes sense ;)
16:23 nyef: I think I'm going to go whimper somewhere.
16:23 karolherbst: well you can always use a useable GUI frontend for gdb, if that's what you mean
16:24 karolherbst: there aren't really alternatives, except with visual studio you get that MS debugger
16:40 nyef: Hrm. PIPE_FORMAT_NONE ?
16:45 nyef: Something seems wrong with the internal format.
16:58 nyef: ... No, internal format looks sane-ish... Just, not.
17:06 kubast2: So I enabled verbose drm logging and increassed buffer length to 16MB
17:06 kubast2: As I'm about to crash on my gt 710/or perhaps the moment when I will launch firefox
17:06 kubast2: is there something else I should provide besides dmesg log ?
17:07 kubast2: because magicsysrq doesn't work on my machine I will sorta have to sync and then reboot from remote ssh session
17:07 kubast2: because networking works just fine ,only screens freeze
17:07 kubast2: maybe keyboard too I'm not sure
17:09 kubast2: well strangelly enough it works rn
17:09 kubast2: just as I enabled verbose drm logging
17:09 kubast2: maybe it's because I left the card idle and went back idk but I can't reproduce it rn
17:10 kubast2: I see the dmesg error is similiar
17:10 kubast2: but it doesn't crash(a slightlly different value)
17:11 kubast2: [ 164.471564] nouveau 0000:01:00.0: fifo: CHSW_ERROR 00000004 ;[ 486.718013] nouveau 0000:01:00.0: fifo: FB_FLUSH_TIMEOUT;[ 486.718216] nouveau 0000:01:00.0: fifo: CHSW_ERROR 00000003;
17:11 kubast2: but the value it crashes end sends endlesslly is CHSW_ERROR 00000006; if I recall correctlly
17:14 kubast2_: Freeze when setting resolution to a 2nd monitor
17:14 kubast2_: it's gonna shytdown soon and I will look at systemctl log
17:16 kubast2_: It was journalctl yeh
17:20 kubast2_: http://termbin.com/9due 19:12:55
17:21 kubast2_: Yeh xfce4 remembered the resolution
17:22 kubast2_: It freezes whenever I startxfce4 now
17:23 kubast2_: Unplugged vga
17:25 kubast2_: Crash ealier than with vga gonna check dmesg output
17:33 kubast2_: termbin.com/k55w
17:34 kubast2_: 🤷
17:37 kubast2_: I can try and compilr drm_next (https://cgit.freedesktop.org/~airlied/linux/) kernel with 18.1.3 mesa and 2.4.92 libdrm
17:43 nyef: Hrm... GDB's stepper seems unreliable?
17:49 kubast2_: xor log http://termbin.com/5hfq3
17:49 kubast2_: xorg*
17:53 kubast2_: vbios.rom it sorta is related to output config?(likr chainging resolution? No?) https://transfer.sh/GKT0N/vbios.rom
17:55 kubast2_: karolherbst: Is there something I should do/provide for those crashes?
18:03 nyef: Ugh. Is there no easy way to look up symbolic names from errno values?
18:05 pendingchaos: strerror?
18:06 karolherbst: nyef: you can call any symbol from within gdb
18:06 karolherbst: nyef: also, the return type is int
18:06 nyef: Right, okay. Close enough.
18:06 karolherbst: or, erm
18:06 karolherbst: errno is type int or something
18:07 karolherbst: it works for enum types
18:08 nyef: strerror() isn't working. Returns a bogus pointer.
18:08 karolherbst: nyef: also, stepping is unreliable any time you don't compile with O0
18:08 karolherbst: and disbale function inlining
18:08 karolherbst: like force disable it
18:08 karolherbst: nyef: "p strerror(errno)"?
18:09 nyef: "print strerror(22)" => $53 = -148010742
18:09 karolherbst: for me: 0x7ffff75cac65 "Invalid argument"
18:09 karolherbst: p strerror should print something like {char *(int)} 0x7ffff74d1eb0 <strerror>
18:09 karolherbst: maybe the symbol wasn't properly resolved?
18:10 nyef: I get something about {<text variable, no debug info>}, but otherwise in that ballpark.
18:10 karolherbst: nyef: uhm... why is the value negative?
18:10 karolherbst: why isn't it hex?
18:11 karolherbst: -148010742 looks like a valid pointer strerror might return
18:11 nyef: Because the high bit is set, and there's no type information, so it's an integer?
18:11 nyef: Casting it gets an error about cannot access memory at address.
18:11 karolherbst: uhm, weird
18:11 nyef: Yeah.
18:12 karolherbst: if you just drop he sign bit?
18:12 nyef: Overall, I am Not Impressed, but I realize that I'll probably have to tweak my system config in a manner that I only discovered recently and then rebuild the world before things will improve.
18:12 karolherbst: *the
18:12 karolherbst: nyef: doubtful
18:12 nyef: Ahh... Sign-extend-32?
18:12 nyef: It'd at least get me debug info for all of the system libraries.
18:13 karolherbst: well yeah, that usually makes things easier
18:13 karolherbst: but that strerror thing should work nonetheless
18:14 nyef: Too much insanity in the local environment, going the long way around.
18:14 karolherbst: so uhm try 0x7ffffffff72d890a
18:14 nyef: Hrm. Okay, will try.
18:15 karolherbst: nyef: ohh, right, it could be just a sign extended int value, but why would it be a 32 bit value to begin with?
18:16 karolherbst: or maybe indeed missing debuginfo kills it
18:16 karolherbst: and it assumes int for everything
18:16 nyef: Still invalid address.
18:16 karolherbst: mhh, maybe debug info for glibc alone would help quite a lot already
18:16 nyef: Yeah, the missing debuginfo means that it has to default to returning int (per C rules).
18:17 karolherbst: ahh wait
18:17 karolherbst: 0x7ffffffff72d890a is wrong :D
18:17 kubast2_: Gonna test it out under drm-next
18:17 karolherbst: we don't have a 4 bit mmu
18:17 karolherbst: nyef: 0x7ffff72d890a
18:17 nyef: EINVAL.
18:17 karolherbst: :)
18:17 karolherbst: so yeah
18:18 karolherbst: it discards high bits
18:18 karolherbst: and sign extends
18:18 nyef: (Found in /usr/include/asm-generic/errno-base.h)
18:18 karolherbst: try 0x7ffff72d890a though
18:18 karolherbst: it should work
18:19 nyef: "Invalid argument". Good call!
18:19 karolherbst: yeah
18:19 karolherbst: 48 bit adddresses :)
18:19 kubast2: on crash under drm-next vs 4.17 after I did the vga res switch on 4.17
18:19 kubast2: *no
18:20 karolherbst: nyef: p (char*)strerror(22)
18:20 karolherbst: that should work as well
18:20 karolherbst: I think
18:20 karolherbst: ....
18:20 karolherbst: depends on when the casting happens
18:20 nyef: karolherbst: Too late, the truncation to int already happened.
18:21 karolherbst: okay, weirdo hack then
18:21 nyef: And I can't be bothered working out the syntax to take the address of strerror, cast it to an appropriate function pointer, and then call through that.
18:21 karolherbst: yeah
18:22 karolherbst: having debuginfo files for glibc sounds less messy
18:22 nyef: So, next bit of fun: that's -EINVAL returned from drmCommandWriteRead() in abi16_bo_init() in nouveau_bo_new() for a 300x300 depth buffer.
18:23 nyef: I'm still in abi16_bo_init(), though access to actual values may be a bit iffy right-right now.
18:23 nyef: And this is on my MCP89 / NVAF machine.
18:28 kubast2: hmm drm_next works so far ,perhaps it's the lack of mixing libdrm-git and lib32-libdrm from repo
18:28 kubast2: seems that it have booted up where 4.17 have crashed 3 times in a row
18:28 kubast2: after the vga res change/and then unplug
18:28 nyef: Typical. glxgears doesn't check the return value of glXMakeCurrent(), which plausibly would have at least made the overall failure mode less egregiously stupid.
18:29 kubast2:hotplugs vga ,enables dual screen and sets the right resolution for vga
18:31 kubast2: disabled mirroring displays
18:31 kubast2: werks same goes for pstate change
18:31 kubast2: gonna try suspend/resume
18:31 nyef: Okay, I still have the request object. That's a start.
18:32 nyef: And I know that the failure is EINVAL, rather than ENOMEM or anything like that.
18:32 kubast2: ouch insurgency crashed gonna check the logs
18:32 kubast2: of insurgency
18:37 kubast2: failed to create gl context; might be due to 0.1.0 older mesa?
18:39 kubast2: well works for blender so that's nice
18:46 crmlt: karolherbst: seems patch fixed it!
18:47 crmlt: karolherbst: X11 started flawlessly
18:48 nyef: crmlt: Was this changing the mcp89 mmu type, or something else?
18:49 kubast2: crmlt, same for me I compiled drm_next kernel ,after I switched the vga res and even disconnected the vga it started to crash every xorg launch
18:50 crmlt: nyef: yes the patch changes mmu to mcp77_mmu_new,
18:50 crmlt: see https://bugs.freedesktop.org/attachment.cgi?id=140478
18:50 kubast2: ah that's something different I guess
18:51 karolherbst: crmlt: okay, cool
18:52 nyef: Guess I'd better try this on my MCP89, make sure that it doesn't make things worse there. (-:
18:52 crmlt: kubast2: you're right
18:53 kubast2: I think you are using an out-of-tree rn?
18:53 kubast2: So my guess is that fix for my gt710 is allready in drm-next
18:54 kubast2: *was
18:54 kubast2: yeah nvm u know the drill
18:54 kubast2: I have no idea what your issue was nor what I'm right about crmlt
18:55 crmlt: nyef: works much better, compositor is working faster too, i hope it will be stable
18:55 kubast2: I'm crossing fingers for lib32-libdrm not to crash/trash nouveau at threading tests again
18:56 kubast2: *the "test" of lib32-libdrm after compilation it's in both official arch pkgbuild and the aur one
19:01 crmlt: karolherbst: btw. this is what i get from dri/0/pstate https://hastebin.com/abufekodil.go
19:01 crmlt: the AC line has 0 mhz memory speed
19:04 nyef: crmlt: That's system shared memory, reclocking it is fairly unlikely to be a safe thing to do.
19:11 crmlt: another issue is that if I change pstate to 0f it breaks suspend mode the machine is then unable to succesfully suspend
19:11 crmlt: while in default state it does work great
19:12 nyef: ... I'd be more concerned about that if I were on a laptop, though I distinctly remember having suspend issues with an all-in-one at some point.
19:19 nyef: crmlt: If you step the pstate up, can you suspend if you step it back down?
19:20 nyef: (More idle curiousity on my part than anything else, really.)
19:21 crmlt: by default there isn't any state highlighted
19:22 crmlt: but I could play with it for a while...
19:22 nyef: Ah, right.
19:29 crmlt: same behaviour on all levels
19:30 crmlt: echo 03 even completely freezes whole system
19:33 nyef: Hrm. Sounds like I should possibly play with reclocking, myself.
19:47 nyef: Oh, neat. Glxgears now works.
19:48 crmlt: Cool
19:49 nyef: gxine still breaks things, though.
20:02 pendingchaos: am I correct in thinking that the meaning of the combined thread id special register (tid, not tidx, tidy or tidz) has been the same since fermi?
20:04 HdkR: pendingchaos: Tesla even
20:08 pendingchaos: what is the status of compute with tesla btw? there seems to be some code for it but ARB_compute_shader is not enabled
20:27 kubast2: I have contacted guys behind insurgency about the game failling to start
20:27 kubast2: since dead island and left 4 dead 2 work just fine
20:51 nyef:sighs.
20:51 nyef: "link rate unsupported by sink".
20:51 nyef: I don't think nouveau likes *any* of my hardware.
20:53 kubast2: it does love my laptops gpu
20:53 kubast2: and my gt 710 after I put drm_next kernel on it
20:57 nyef: HDMI audio is broken on MCP89 on some displays, notably including the PlayStation 3D Display. I can lock up X on my MCP89 by loading gxine and moving the window around. My laptops are stupidly quirky when it comes to suspend/resume and HDMI input. And it turns out that my one external DPort screen won't link-train with nouveau.