00:58TheXzoron: so after trying to autostart sway at login my whole system locked up but i can connect to it over ssh
01:00gnarface: that suggests just Xorg locked up, not the whole system
01:02gnarface: if you kill it and sway over ssh you might be able to regain control of the keyboard and console
01:03TheXzoron: that worked
01:04TheXzoron: now any idea why it isnt starting or would that be for another irc
01:04gnarface: if the kernel had actually i/o locked you wouldn't have been able to even connect by ssh; it wouldn't even have responded to pings
01:04gnarface: uh, i can only guess, i'm not a nouveau developer, sorry
01:04gnarface: but you might want to try disabling compositing?
01:05TheXzoron: i get this when trying to start it [main.c:48] [wlc] Failed to add socket to wayland display
01:05gnarface: or even just try a less graphically advanced display manager
01:05gnarface: OH you're using wayland too
01:05TheXzoron: i log in via a tty
01:05TheXzoron: display managers are rubish imo
01:05gnarface: yea that's another complication i don't know much about, but as far as i understood it, doesn't wayland actually work with compositing always-on?
01:06gnarface: hmmm, someone in here who is asleep or afk right now might actually know the answer
01:06gnarface: which GPU are you using, and which linux kernel, just curious?
01:06TheXzoron: 780ti and 4.13.15
01:07TheXzoron: i haven't booted into 4.14 yet
01:07gnarface: i just looked up sway, wasn't familiar with it
01:07gnarface: yea, i couldn't guess whether the bug is in sway or nouveau
01:08gnarface: i'd lean towards the latter, but i don't know how to tell
01:08gnarface: arch linux's wiki warns sway is a work in progress
01:08gnarface: have you compared to other wayland compositors? (i don't even know what is available)
01:09TheXzoron: never used wayland
01:09gnarface: if it only happens with sway, maybe their irc channel might know
01:09TheXzoron: now that i do not use the blob anymore i decided to try it out
01:10TheXzoron: guess ill load xorg back up so i can get off my phone
01:12gnarface: xorg should be pretty stable if you avoid compositing
01:12gnarface: with a compositor, ymmv
01:12TheXzoron: I mean it worked before
01:12TheXzoron: yeah w/o a compositor I can't get rid of tearing
01:12gnarface: really? hmmm
01:12gnarface: you sure?
01:13TheXzoron: is there something I need in xorg.conf to fix tearing?
01:13TheXzoron: I thought it was just symptomatic of nouveau
01:14gnarface: i think there is an option for xorg.conf to enable vsync, yea. weird though the manpage says it's on by default. i'd check the Xorg log to see what's up, the manpage may be out of date
01:14gnarface: Option "GLXVBlank" "boolean"
01:15gnarface: PageFlip, SwapLimit and DRI may affect it
01:15gnarface: i think
01:15TheXzoron: is a compositor needed other than effects because personally I liked things just popping up and looking screwy before they loaded fully
01:16TheXzoron: I had it set to use glamor and dri 3
01:16TheXzoron: but I got lock ups after coming back to my computer at xscreensaver
01:17TheXzoron: didn't bother to try to ssh and just rebooted
01:17gnarface: a compositor is not needed for Xorg, and as far as i know only required to opengl-accelerate the desktop (which is hardwired into the way wayland works, as well as a couple misguided window managers for Xorg, but not all of them)
01:17gnarface: xscreensaver might be more stable if you just omit the opengl modules - they'll even lock up the official driver, honestly. they're not very well coded.
01:18TheXzoron: I mean I've been running all the xscreensaver hacks for years now
01:18TheXzoron: the blob never had issues
01:18gnarface: oh? well lucky you then
01:18gnarface: i've had all kinds of trouble with it historically
01:18gnarface: but the non-opengl ones have always been stable
01:19TheXzoron: it may have been related to how i set the gpu to the lowest performance state
01:19gnarface: as for the lack of tearing with compositing, i think that's just a side effect of default driver settings for opengl apps
01:21gnarface: i don't know much about glamor but enabling that may be the source of your instability
01:21gnarface: i thought i remembered them talking about it in here the other day
01:21TheXzoron: yeah that was me
01:21gnarface: oh, heh, sorry
01:21TheXzoron: I thought it would be better to use
01:22TheXzoron: but I don't really know much about how this works
01:22TheXzoron: also I just recalled last crash was with stock nouveau settings
01:22TheXzoron: no modesetting or glamor
01:22TheXzoron: so I think it is related the lowest performance state
01:22TheXzoron: as I haven't had anything since I didn't muck with that value
01:23gnarface: could be
01:23TheXzoron: don't know why that would be though
01:23TheXzoron: should just run slow
01:23gnarface: race condition in crappy thread handling would be my vague guess
01:24gnarface: i couldn't be sure the problem isn't with mesa, either. i was messing with an old laptop with intel hardware recently and i noted sporadic segfaults in e17 with it at the same time that i realized the xorg log was reporting only dri 2 when the docs said that hardware should be able to support dri 3
01:25gnarface: that was kernel 4.9 though
01:25TheXzoron: what exactly does the dri value do
01:59TheXzoron: seems I got wayland working
01:59TheXzoron: really dumb what the issue actually was
02:00TheXzoron: XDG_RUNTIME_DIR wasn't specified
02:00TheXzoron: so I just set it to my home dir
02:04TheXzoron: now to figure out all the software I need to replace
02:04TheXzoron: so long xscreensaver
02:08koz_: I'm a bit confused about https://github.com/CPFL/gdev ; does this actually work on Nouveau? If so, does that mean I can do compute stuff with it?
02:12airlied: it's a pretty dead project
02:12airlied: unless the new committer is up for some serious work
02:30gnarface: TheXzoron: sorry, had wandered away. glad you got it working. i think the DRI value just sets the direct rendering protocol version or something like that
02:31koz_: airlied: I have no idea - this is a response I got after two years.
02:31koz_: Whether this person is up for work, serious or not, I dunno.
03:13Aristar: nouveau 0000:01:00.0: X nv50cal_space: -16
09:39perfinion: imirkin: so using Options AccelMethod none makes it work, it goes to X and stays there. on kernel 4.14.4
09:40perfinion: whats the whole exa vs glamor thing? it says default is exa, should i try glamor?
14:33Aristar: would disabling kernel.hardlockup_panic potentially allow nouveau to recover when shit completely freezes to the point where sysrq doesn't even respond? eventually hw watchdog triggers though
14:33Aristar: 128MB ddr2 vram sucks on this old craptop
14:34Aristar: and 32bit addressing from bios
14:34Aristar: even though it runs x64_6
14:35Aristar: like all 32bit tables and garts and page mapping
14:35Aristar: i'm guessing the answer is no, but wondering now if maybe some obscure kernel cmdline parameters may help
14:36Aristar: cause it occurs pretty reliably if too many browsers are open
14:36Aristar: even with gpu disabled in browsers
14:37Aristar: like chromium/chrome flags to disable ALL the guy stuffs, the switches span like 6 lines of word wrap
14:45imirkin: perfinion: you definitely want nothing to do with glamor, esp on a nv4x.
14:45imirkin: Aristar: nv50cal_space means your gpu has hung or you're submitting commands way too fast
17:37karolherbst: pmoreau: that test fails due to missing support for OpenCLLIB::Nextafter
19:16pmoreau: karolherbst: Which test?
19:16karolherbst: pmoreau: kernel_limit_constants
19:17karolherbst: nextafter is pain to implement... a lot of branching
19:17pmoreau: Haven’t looked yet at what it does
19:19karolherbst: basically the next representable floating point value towards a direction
19:19pmoreau: Hum, indeed
19:20pmoreau: I started to get rid of paddings in SpirVValue yesterday, and I’m most likely going to continue that today.
19:32karolherbst: nice, nextfloat(MAXFLOAT, inf) == inf already passes
19:33karolherbst: now I need to fix nextfloat(-MAXFLOAT, -inf) == -inf
19:34karolherbst: pmoreau: https://gist.githubusercontent.com/karolherbst/824ca9756d7254ce4bbfdac04b46d9a1/raw/01edd860a560dc60e6d86e356a7678193a4a7f58/gistfile1.txt
19:43karolherbst: meh.. my stuff only works for positive floats
19:45karolherbst: and it doesn't work for 0 correctly either
19:52karolherbst: "PASSED test." \o/
19:54karolherbst: pmoreau: do you think I got the edges right here? https://github.com/karolherbst/mesa/commit/106dd5190664ca6669d34bcbfcbe4dfa92a0d7b8
19:57pmoreau: Shouldn’t this one be a tree as well? https://github.com/karolherbst/mesa/commit/106dd5190664ca6669d34bcbfcbe4dfa92a0d7b8#diff-95fbaa87866a7490a15d06af87e3e69bR3395
19:57karolherbst: mhh, right
19:57karolherbst: makes sense
20:00karolherbst: and I need to change some to CROSS to endBB
20:00pmoreau: Also, (IIRC it’s the same in NVIR as in SPIR-V) branch & co have to be the last instruction of a BB
20:00pmoreau: You can’t have two branch insn within the same BB.
20:01karolherbst: then I need to create more BBs?
20:01pmoreau: I think so.
20:01karolherbst: okay, let see if I manage to get it working
20:02pmoreau: imirkin: Do you confirm one can only have a single branch instruction (or similar) per BB?
20:06tobijk: pmoreau: why would only one branch be allowed per bb?
20:06karolherbst: pmoreau: like that? https://gist.githubusercontent.com/karolherbst/107148c31099a6ba9da7160ddb2b2807/raw/52b6a74cf7e0e80db41bc176b15369202f6333d8/gistfile1.txt
20:06mwk: tobijk: by definition of a bb...
20:06karolherbst: allthough I might just skip those false branch bras
20:06karolherbst: they seem pretty pointless
20:07mwk: a bb is a block of instruction that always executes in an uninterrupted sequence, so only the last instuction can be a branch
20:07pmoreau: tobijk: Makes things easier when processing the BBs? Otherwise it would be super painful for computing the live range of variables if you could have a branch right in the middle of a BB
20:07karolherbst: pmoreau: what do you think, should I keep those bras or throw them away?
20:08pmoreau: Let me have a look
20:12karolherbst: mhh, I keep them
20:12karolherbst: the optimizer seems to kill other bras then
20:12karolherbst: or I have to get the edges right
20:16karolherbst: pmoreau: well, getting the edges right fixed the issue
20:17karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/7b8a0f9cb9f3f6eeb2a4c9571379dc0804d992dd
20:18karolherbst: there is still something wrong
20:40karolherbst: pmoreau: we also need SHadd
20:41karolherbst: well Hadd
20:41karolherbst: (x+y) >> 1
20:58karolherbst: imirkin: do you know how I can do an OP_BAR which blocks for all active work groups?
21:20levrano: you are major trolls and quite frankly enourmous idiots, but once and for all some need to make some clearer thoughts present into your brain, https://devblogs.nvidia.com/parallelforall/fast-dynamic-indexing-private-arrays-cuda/
21:21levrano: what the fuck are you picketing for? can you please read some stuff from the net, how pointers work?
21:32levrano: you may imagine and indirect load in both sides of the assignment containing index lanes which are the contents of the vector registers, if the contents are what they call unifromly distributed then 2.5 replays per warp i.e 2.5 cycle latency to repoint stuff, dynamically uniform means consecuetive i.e continuous registers in regfile 1 2 3 4 5 6 .... where uniform has 1cycle latency which means 3 3 3 3 3 ....
21:32levrano: and if all lanes are neither those, it has 64 replays 64 cycles latency
21:33pmoreau: karolherbst: Did you update your nextafter patch? Sorry for not looking at it earlier :-/
21:33karolherbst: pmoreau: yeah, I did
21:33karolherbst: now trying to fix the barrier test
21:33pmoreau: OK. Cause I found an issue in the old one.
21:34karolherbst: what issue?
21:34pmoreau: Ah nice, you fixed it
21:35pmoreau: For the first tbb/fbb pair: you had messed up the branching, but you fixed it in the new version
21:36pmoreau: karolherbst: Hum, I think there is still a bug: https://github.com/karolherbst/mesa/commit/a6adb5e3d2a46aee315e29c7e611eb0120967a2c#diff-95fbaa87866a7490a15d06af87e3e69bR3401 which BBs do you think you are linking here?
21:37karolherbst: fBB with the new fBB
21:37pmoreau: Yeah, you are correct
21:37pmoreau: I should have read a bit more
21:37karolherbst: well, I am sure the code works for nextafter(+- FLOAT_MAX, +-inf)
21:37karolherbst: because that's what the CTS is testing
21:38karolherbst: this hadd thing is annoying
21:39karolherbst: "The intermediate sum does not modulo overflow."
21:39pmoreau: All those fbb are a bit confusing TBH, but it looks good.
21:39pmoreau: You could merge with the current BB
21:39karolherbst: hadd(0x7fffffff, 0x7fffffff) = 0x7fffffff
21:40karolherbst: and I am sure my current code will return 0x3fffffff
21:40pmoreau: You should be able to merge the three last fbb together.
21:40karolherbst: pmoreau: don't know, I rather would keep those OP_BRA
21:41pmoreau: Oh wait, one of them is a tbb
21:41pmoreau: Nevermind then, it’s alright
21:41karolherbst: but I am sure barrier doesn't fail due to my faulty hadd implementation
21:42karolherbst: BARRIER test failed idx 81f2aca != 8c7f0aac
21:42pmoreau: So hadd, what does that one do.
21:42karolherbst: Returns (x + y) >> 1. The intermediate sum does not modulo overflow.
21:42pmoreau: That’s an interesting op
21:43karolherbst: mhh, isn't there a subop for this on OP_ADD?
21:43mwk: isn't that usually called avg?
21:43pmoreau: True :-D
21:44karolherbst: don't we have a hw instruction for this? or do we really have to do add+shr
21:44pmoreau: I tend to not right my averages with a >> 1
21:44pmoreau: karolherbst: Try it on the blob?
21:45karolherbst: have to see if there is a ptx instruction for this, really don't want to do that cl -> ptx thing again
21:45karolherbst: but I am sure they don't, soo
21:46karolherbst: lets see
21:47karolherbst: okay, our value is too big
21:47karolherbst: expected is 0x81f2aca
21:47pmoreau: Why not write a CUDA kernel and compile it?
21:48pmoreau: It’s easier than the cl -> ptx thing.
21:48karolherbst: because I am sure that the test doesn't fail due to this
21:51pmoreau: Regarding the barrier, I have some OpControlBarrier, but I don’t think it’s working yet.
21:52karolherbst: well, the result is always 8c7f0aac
21:57karolherbst: we need OpVectorExtractDynamic
21:59pmoreau: karolherbst: The blob is doing something with the high bits: IMAD.U32.U32.HI R0, R0, 0x2, R0 (where R0 = x + y), and then it does the shift
21:59karolherbst: yeah, expected
21:59karolherbst: but weird they do it this way
22:01karolherbst: pmoreau: can you paste the entire sequence?
22:03karolherbst: pmoreau: do you know this error? input.cl:9:153: error: used type 'event_t' where arithmetic or pointer type is required
22:03pmoreau: Never seen that one
22:03pmoreau: Could you paste the OpenCL C code?
22:04pmoreau: CLOVER_DEBUG=clc CLOVER_DEBUG_FILE=somefile to dump it
22:04karolherbst: they do something like this: (event_t) 0
22:05karolherbst: as the last parameter to async_work_group_copy
22:06pmoreau: OK. I have an async_work_group_copy test in spvtes, which is not passing yet, but I didn’t have that issue.
22:06tstellar: karolherbst: That's technically a bug in the test.
22:06karolherbst: tstellar: ohh, okay
22:07karolherbst: I simply removed the cast and it compiled, but yeah
22:07karolherbst: pmoreau: we need to support OpTypeEvent :/ this event stuff sounds super messy
22:08tstellar: karolherbst: It's unfortunately a situation where the implementations were changed and not the test, so it became de facto part of the standard.
22:08pmoreau: Hum, how come I don’t have a OpTypeEvent
22:09pmoreau: karolherbst: It does sound messy, but it might be easier that what we’re thinking.
22:09karolherbst: I fix the other tests before touching this stuff
22:09karolherbst: like implementing Prefetch...
22:10karolherbst: I think we can just ignore it, right?
22:13pmoreau: Sounds good.
22:15karolherbst: test fails though :/
22:16karolherbst: RA fail maybe?
22:17karolherbst: pmoreau: FAILURE: __IMAGE_SUPPORT__ undefined even though images are supported :D
22:19pmoreau: Hum :-/ No clue
22:20karolherbst: #ifdef __IMAGE_SUPPORT__... stuff in the cl kernel
22:20karolherbst: pmoreau: how does the preprocessor work in our case anyway?
22:20karolherbst: does clover do preprocessing?
22:22pmoreau: I think it does a bit, haven’t really looked into it
22:30karolherbst: ohh, in mad.hi a b c, c gets added to the hi bits of the multiplication
22:31tstellar: karolherbst: clang usually adds those feature defines.
22:32pmoreau: karolherbst: Which test was it with the __IMAGE_SUPPORT__ ? I’ll have a look
22:32tstellar: karolherbst: https://github.com/llvm-mirror/clang/blob/master/lib/Basic/Targets/AMDGPU.cpp#L360
22:32karolherbst: tstellar: we compile cl kernels though, so clang shouldn't be involved at all, right?
22:33tstellar: karolherbst: But clover has the chance to add its own if it wants.
22:33karolherbst: yeah okay, it might be within llvm
22:33karolherbst: but I really don't know how all the glue code works here
22:33tstellar: karolherbst: What are you using to comile the cl code?
22:33karolherbst: llvm to spir-v
22:33pmoreau: tstellar: Is it similar to extension defines, like fp64?
22:34karolherbst: pmoreau: kernel_numeric_constants
22:35karolherbst: fixing my hadd implementation then
22:35karolherbst: pmoreau: I assume that is some kind of optimisation though
22:35karolherbst: what do they do on optlevel 0?
22:36tstellar: pmoreau: extensions are defined here: https://github.com/llvm-mirror/clang/blob/master/lib/Basic/Targets/AMDGPU.cpp#L360
22:36pmoreau: tstellar: I had to manually set the SupportedPragmas to get it to “support” doubles. It looked like it should be set by the frontend, but wasn’t in this case. Maybe something wrong in SPIRV-LLVM?
22:36tstellar: pmoreau: Which are converted to macros here: Basic/Targets/AMDGPU.h
22:37tstellar: Sorry I mean here: https://github.com/llvm-mirror/clang/blob/master/lib/Frontend/InitPreprocessor.cpp#L1044
22:37pmoreau: I see
22:39tstellar: pmoreau: Is SupportedPragmas something in SPIR-V ?
22:39karolherbst: pmoreau: :D it got optimised into a shladd
22:39pmoreau: tstellar: Here is what I needed to do https://phabricator.pmoreau.org/diffusion/MESA/browse/nouveau_spirv_support/src/gallium/state_trackers/clover/spirv/invocation.cpp;2cf57e2672a1e0bf40cb2b5328b12dfd416730c0$699-722
22:39karolherbst: pmoreau: which doesn't know the hi subop I think
22:40pmoreau: No, I meant clang::CompilerInstance.getProcessorOpts().SupportedPragmas
22:42tstellar: pmoreau: What version of clang are you using?
22:42pmoreau: SPIRV-LLVM, which is almost based on 3.6 (missing a few patches from 3.6 I think) https://github.com/KhronosGroup/SPIRV-LLVM/
22:44tstellar: pmoreau: Oh ok, that's pretty old. Did you ever look at clspv ?
22:44pmoreau: I quickly did but I haven’t tried using it.
22:44tstellar: pmoreau: A lot of stuff has been fixed in trunk since 3.6.
22:44pmoreau: That makes sense
22:44pmoreau: 3.6 is quite ancient
22:45pmoreau: clspv is generating Vulkan SPIR-V and might not be able to support everything OpenCL can, but it could be good enough for now.
22:45tstellar: pmoreau: Yeah AMD especially has been putting a lot of work into the frontend in the last year+
22:46pmoreau: There is also https://github.com/thewilsonator/llvm-target-spirv/ which I wanted to try, but haven’t yet. It’s a rebase of SPIRV-LLVM on top of trunk + some more work I think.
23:35karolherbst: pmoreau: can you check what nvidia is doing for sign()?