02:12AndrewR: sorry, NV50_PROG_USE_NIR=1 piglit resulted in hanged GPU (nv92) :}
02:43ccr:grumbles at NOUVEAU_ERR() macro ..
02:45ccr: would be great if it printed __FILE__ as well, hitting static function temp() in either nv30/{nvfx_fragprog.c nvfx_vertprog.c}
02:46ccr: guess it's fragment shader anyway
18:29Lyude: if you guys could have a text width limit on envytools, would you, and what would it be? (wanted to update nvbios to understand all of the vbios init opcodes we know so far, but also the style is all over the place and hurts to look at so I figured I'd add a clang-format file while I'm at it
18:40imirkin: 80 chars
18:40Lyude: imirkin: ...smaller then the kernel?
18:41imirkin: don't know or care what the kernel says
18:41Lyude:was really hoping for at least 100
18:41Lyude:really wants 100, actually
18:41imirkin: i think 80 is a fairly optimal width for code
18:41imirkin: but perhaps i'm biased since it works so well on my setup?
18:41Lyude: i did until the linux kernel switched to 100 and oh boy, 20 chars makes a lot of difference
18:41imirkin: yeah. the difference between things fitting on my screen and not fitting ;)
18:41imirkin: [without line-wrapping]
18:42Lyude: imirkin: do you not have a 16:9 screen?
18:42imirkin: i have 2x 10:16 screens
18:42imirkin: (aka rotated 16:10 screens ... 1920x1200)
18:43imirkin: 2x 80 chars fits REALLY nicely on one screen
18:44imirkin: that said, i'm never in favor of hard limits
18:44imirkin: on occasion, fitting within the limit is worse than not
18:44imirkin: this is also why i don't like automated formatting tools
18:44imirkin: since they follow specific rules, not harder-to-measure things like "readability"
18:44Lyude: imirkin: i mean i'm definitely not running clang-format without going over the resutls
18:44Lyude: *results
18:44imirkin: sure
18:46Lyude: skeggsb, opinions?
18:48Lyude: also karolherbst, opinions on a text width for envytools?
18:48Lyude: i'
18:48Lyude: oop
18:48Lyude: *I'm not converting the whole thing but would like to convert some of the files I'm working on that are very much all over the place style wise
18:49karolherbst: Lyude: I don't care how wide those are as long as it's not ridiculous
18:49karolherbst: ask mwk
18:57imirkin: Lyude: there are some nvbios files in desperate need of ... attention
18:57imirkin: it's not the cleanest code around
18:57Lyude: imirkin: yeah :), I've seen lol
18:57imirkin: i believe skeggsb had some hopes of converting nvbios to use the nouveau bios parser stuff
18:57imirkin: and/or had some variant of it already
18:57karolherbst: which is probably a good idea regardless
18:58imirkin: of course there are still two vbios parsers in nouveau
18:58imirkin: so nothing is perfect ;)
18:58karolherbst: at least nvbios handles the newer files correctly
18:58karolherbst: uhm
18:58karolherbst: nouveau
18:58karolherbst: not nvbios
18:58imirkin: yeah
18:58RSpliet: speaking of perfect, can we now do SPIR-V->NIR->TGSI->NV50_IR? :-P
18:59karolherbst: imirkin: I am actually wondering why we don't do packed struct/union magic :D
18:59imirkin: RSpliet: and then back -> SPIR-V again ;)
18:59karolherbst: but that's probably too messy
18:59karolherbst: RSpliet: sure
18:59karolherbst: but why? :D
19:00karolherbst: ntt is really for drivers not wanting to support nir directly, like nv30
19:00Lyude: karolherbst: packed can be slow sometimes but if that doesn't matter packed rules
19:00karolherbst: or I think the end goal was even to just use it in gallium and get rid of glsl -> tgsi entirely
19:00Lyude: if you don't need to do __attribute__((packed)) and only need bitfields though that helps
19:00karolherbst: Lyude: vbios parsing :p
19:00Lyude: karolherbst: ah yeah
19:01karolherbst: but this doens't map nicely to some tables I think
19:01karolherbst: some tables are just stupid inconsistent
19:01karolherbst: even within the same version
19:01karolherbst: different length and other fun
19:02RSpliet: karolherbst: for the glory of satan of course!
19:27AndrewR: and just in time I was about to show my latest hack karol disappeared ....
19:27AndrewR: https://pastebin.com/PESugQkM
19:50AndrewR: so, [45013.395809] nouveau 0000:02:00.0: fifo: channel 5 [seamonkey[7039]] unload timeout" :) (was abusing GPU with tocl benchmark + my patch )
19:50AndrewR: but at leats main display on another GPU still work
20:21Lyude: we never powered off the edp panel in nouveau.
20:22Lyude: that explains a lot.
20:22Lyude: (no, the vbios scripts (at least the ones we call from nouveau currently) apparently do not do this)
21:19RSpliet: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/4
21:19RSpliet: Apparently someone is trying to plug random graphics cards into a Raspberry Pi 4
21:19RSpliet: And had nouveau fall over
21:21imirkin: iirc ARM devices don't end up providing enough PCI window
21:21imirkin: so the BARs never get allocated
21:22imirkin: (at least not all of them)
21:23ericonr: they might have issues with floating point stuff
21:23ericonr: at least on the AMD side
21:24ericonr: 5700XT support for aarch64 required a bit of kernel hacking to fix floating point exceptions
21:24imirkin: well, if you look at the issue, it's definitely talking about BAR space.
21:25ericonr: oh yeah, I just think they might have issues after that's sorted out
21:25ericonr: all the power, tho
21:37HdkR: Float exceptions? In the kernel?
21:38HdkR: Sounds like a pretty bad bug if Radeon+ARM is causing float problems
21:42ericonr: it's ok on x86 specifically
21:42ericonr: from what I understand
21:42ericonr: ppc64 shows the issues more clearly
21:43karolherbst: huh, but that's little endien arm, no?
21:43karolherbst: *endian
21:43HdkR: Nobody should ever run big endian ARM
21:44karolherbst: HdkR: you misspelled "Nobody should ever run big endian"
21:44ericonr: karolherbst: sorry, ppc64le
21:44HdkR: That too
21:44ericonr: it's more about the guard code around fp operations than endian issues
21:44karolherbst:cries in s390x
21:44karolherbst: ericonr: probably
21:44karolherbst: all the fp stuff is just terrible
21:44ericonr: :)
21:44HdkR: Stop doing FP in the kernel? :P
21:45ericonr: sounds reasonable to me
21:45karolherbst: stop messing with global fp state
21:45karolherbst: ericonr: do you have code flipping rounding mode?
21:45karolherbst: like compiled with Ofast?
21:45Lyude: note there's only one user of fp in the kernel
21:45karolherbst: ahh, this is a kernel problem?
21:45karolherbst:doens't know the context
21:46ericonr: karolherbst: I don't. And I don't have ppc64le hardware either, I just know this from second hand
21:46ericonr: and indeed, it's a kernel issue in amd drivers
21:46karolherbst: I know that things break if you compile stuff with Ofast
21:46ericonr: is it even noticeably faster?
21:46karolherbst: benchmarks say so
21:46ericonr: I only know it's dangerous
21:47karolherbst: you know, Ofast exists so that CPU vendors can show of with huge numbers (and while also disabling mitigations)
21:47karolherbst: some still use Ofast "because"
21:47ericonr: avx512 is more showoff-y, I think
21:47ericonr: but it's at least "correct", usually
21:48karolherbst: avx512 is just terrible
21:48karolherbst: avx512 only exists for intel management so they can boast about how their CPUs have more threads than GPUs
21:49HdkR: AVX512 only redeeming quality is that it allowed Intel to optimize `rep stos` and `rep movsb`, We can now throw away avx512 since we gained that :P
21:49karolherbst: :p
21:49karolherbst: it's probably still slower
21:49karolherbst: also, what you do with your 200 loc assembler memcpys?
21:49karolherbst: all for nothing now?
21:50HdkR: Pretty sure it's the first check in glibc now?
21:50karolherbst: so if you don't have it, it's now even slower?
21:50ericonr: not for nothing, they help slow down whole cpu clock :)
21:50karolherbst: ericonr: nope, they actually make memcpy faster
21:51ericonr: karolherbst: I meant that other cores not doing memcpy still suffer
21:51karolherbst: but this is all just super terrible anyway
21:51HdkR: nah, glibc uses a dynamic dispatch
21:51karolherbst: HdkR: at least something
21:51ericonr: but eh, what do I know, I daily drive musl anyway
21:51karolherbst: but as long as intel still thinkgs that SIMD is a good idea we will have to deal with that crap
21:51HdkR: yea :|
21:52karolherbst: does musl have a fast or trivial memcpy implementation? :D
21:52karolherbst:thinks if trivial and fast implementations are not the same the technical base is just horribly broken
21:53ericonr: karolherbst: it has specialized asm for some stuff, including memcpy iitc
21:53ericonr: iirc
21:53HdkR: It's pretty basic compared to glibc's version though
21:53ericonr: https://git.musl-libc.org/cgit/musl/tree/src/string/x86_64/memcpy.s
21:53ericonr: probably is
21:53karolherbst: well, you can also write it in C and cry every time auto vectoriazation fails
21:54karolherbst: ahh yeah
21:54karolherbst: ericonr: that is slow memcpy :p
21:54ericonr: :)
21:54karolherbst: fast memcpy looks more like 5 loops and shit
21:54ericonr: I don't really understand x86_64 asm
21:54HdkR: and a dispatch table and cpuid detection
21:55ericonr: I think felker cares a lot more for simplicity than squeezing cycles out
21:55karolherbst: ericonr: https://twitter.com/jfbastien/status/1288232681432440834 :p
21:55HdkR: and then the ideal is `rep movsb; ret` now ;)
21:55ericonr: karolherbst: re. fast and trivial should be the same, I agree
21:55ericonr: karolherbst: my eyes!
21:56karolherbst: :p
21:56karolherbst: but it's fast
21:56karolherbst: and in C
21:57HdkR: x86 performance characteristics is silly
21:57karolherbst: I think that's true for most CPU archs
21:58karolherbst: I mean, I kind of understand
21:58karolherbst: they still have to deal with shitty code because people think writing "parallel" code means inserting locks weverwhere
21:58karolherbst: *everwehere
21:58karolherbst: ...
22:00HdkR: ARM is still pretty sane. Interleave your loadstores and unroll the loop to roughly the number of pipelines
22:00karolherbst: that sounds terrible
22:00HdkR: Sadly we don't have a rep movsb equivalent there yet
22:01HdkR: ARM did gain a 64byte cacheline copy for PCIe bursts though
22:01karolherbst: "uff"
22:02karolherbst: I think the fate of all CPU archs is, that they suffer from legacy stuff they can't break
22:05HdkR: Which is semi-reasonable
22:05HdkR: Implement it right the first time, otherwise pain for all
22:05karolherbst: I don't even think x86 did anything "wrong" at the time, they just can't fix it
22:06karolherbst: if you'd write your x86 programs in the native CPUs ISA it would probably look beautiful
22:06karolherbst:has seen some native ISA stuff once and it just looked like GPU code
22:07HdkR: Sadly if you wrote your code in native CPU ISA it would also break in a single generation
22:08karolherbst: sure
22:08karolherbst: but there was this idea of just compiling everything to llvm ir and do the final compilations on target machines, but I guess that was also painful
22:08karolherbst: :D
22:09HdkR: oops, now your struct alignments are mangled
22:09ericonr: karolherbst: didn't apple do this/
22:09ericonr: ?
22:09ericonr: I think I read something about them distributing bitcode for some stuff
22:10karolherbst: nope
22:10karolherbst: they played around with it
22:10karolherbst: but was never used
22:10HdkR: They required IR submission for Watch apps submitted to the store, but as far as I know they only ended up using it to ensure people weren't doing evil things
22:11HdkR: LLVM IR isn't good enough for cross-arch compiling it turns out :P
22:11karolherbst: who knew
22:11karolherbst: :D
22:12karolherbst: I think a lot of companies ran into this mistake
22:12karolherbst: unstable IR? sounds good :p
22:12HdkR: perfect, ship dxil
22:12karolherbst: imagine khronos wouldn't have dropped SPIR
22:13HdkR: Oops, SPIR, push that under a rug
22:14karolherbst: llvm ir was a mistake :p
22:16HdkR: Using it outside of LLVM is a mistake. It should be an internal IR only
22:16karolherbst: yep
22:17karolherbst: maybe somebody should just submit an MR to remove the API :D
22:17imirkin: tell that to llvmpipe
22:18karolherbst: mhh, isn't that different?
22:18HdkR: LLVM should have had a stable ingest IR first :P
22:18imirkin: it's using llvm ir outside of llvm
22:18karolherbst: HdkR: "ouch"
22:18karolherbst: you mean, spirv? :D
22:19HdkR: haha
22:20karolherbst: imirkin: but I think there was a difference between the constructing the stuff while you go and reusing those binary files
22:20karolherbst: but yeah.. maybe it's all the same
22:20imirkin:is just trolling
22:20karolherbst: yeah.. I think what we do is just doing this "string" based stuff where you just use strings for the opcode names :p
22:46astlydichrar: hi there! I'm looking for some guidance on the "Blank monitor, flicker, snow, or other random live image corruption" section of the troubleshooting page
22:47astlydichrar: basically, I have an Intel + Nvidia 1050 setup on a laptop and this is happening with my external screen: https://www.youtube.com/watch?v=W6GFjHwbcv0
22:48Lyude: oh wow
22:48Lyude: that's a bizarre one
22:48astlydichrar: yep! nobody on /r/debian even commented on a post of mine about that
22:49astlydichrar: the card is supposed to be supported by nouveau though, it's gp107
22:49Lyude: it eerily reminds me of underflow reporting
22:50Lyude: skeggsb: if you have any time, mind taking a look at that video? ^ would be appreciated if you could look at the issue as well, but if you don't have the time too just let me know. I'm mostly curious if you've seen anything like that before, and if maybe my guess about underflow reporting sounds right
22:51Lyude: astlydichrar: are you on X by chance and if so, what xf86-video driver are you using (modesetting or nouveau?)
22:52astlydichrar: this is on wayland
22:53Lyude: sweet, I hate looking at X :)
22:53astlydichrar: how would I figure out which xf86-video driver am I using?
22:54Lyude: karolherbst: btw, are we using the gitlab for tracking nouveau issues yet
22:54Lyude: astlydichrar: you don't-you're using wayland
22:54karolherbst: Lyude: yeah
22:54Lyude: oh cool
22:54Lyude: i always lose track of the kernel bz so that helps
22:54karolherbst: https://gitlab.freedesktop.org/drm/nouveau/-/issues
22:54karolherbst: it slowly fills
22:54karolherbst: you should have full access
22:55Lyude: astlydichrar: I don't have the time to look today and I'm not sure what skeggsb is up to, but mind filing a bug there so I don't lose track of this?
22:55astlydichrar: sure, I will :)
22:55Lyude: cool, link to it afterwards and I'll watch it on gitlab
23:19astlydichrar: Lyude: should I include a VBIOS dump?
23:20Lyude: astlydichrar: might not be a bad idea tbh, also you can ignore the instructions on the wiki and just grab /sys/kernel/debug/dri/1/vbios.rom and /sys/kernel/debug/dri/1/strap_peek and upload those
23:34astlydichrar: Lyude: could you copy-paste the instructions again? had to reboot my computer and can't see your previous message, sorry
23:38ericonr: astlydichrar: grab /sys/kernel/debug/dri/1/vbios.rom and /sys/kernel/debug/dri/1/strap_peek
23:45astlydichrar: thanks! issue posted here: https://gitlab.freedesktop.org/drm/nouveau/-/issues/17