07:47imirkin: gah, looks like despite encodings existing, nvidia hates 64-bit ATOMS ops
07:47imirkin: at least on this GP108
07:50imirkin: at least xchg/cas work
07:51HdkR: What does hates mean this time?
07:53imirkin: i think that's the purest expression of hatred :)
07:53HdkR: huh, wonder if it doesn't exist on some chips
07:54imirkin: shared memory tends to get the shaft with stuff like this
07:54imirkin: maxwell is the first one where there's any ATOMS at all
07:54HdkR: I believe it should work on Maxwell Tegra at least?
07:54imirkin: for shared mem?
07:54imirkin: for gmem it works fine
07:55imirkin: maybe i'm doing the encoding wrong, but nvdisasm seems to like it
07:55imirkin: i guess i should figure out how to ptx
07:55HdkR: nvdisasm can be a bit...accepting
07:55imirkin: envydis is also happy with it, but that's just a mirror of nvdisasm :)
07:56HdkR: I've seen cases where it accepts an instruction just because it ignores some bits in the encoding :D
07:56HdkR: big oops
07:56imirkin: do you know offhand how i can write a ptx program for this?
07:56imirkin: i think i have stuff somewhere, but would have to find it
07:57imirkin: nevermind, found something
07:58imirkin: HdkR: https://hastebin.com/bakilefina.scss -- does that seem right?
07:58imirkin: (but adjust the sm_35 -> sm_50 or whatever
07:58HdkR: I've never written a ptx function like that. Not sure :D
07:59HdkR: inline asm for ptx works, which is how I tinkered with it that way
08:02imirkin: this is from ptxgen, which was used for some of the earlier RE
08:05imirkin: HdkR: https://hastebin.com/behanadobu.less
08:05imirkin: so ... yeah. just cas + xchg for 64-bit atoms it would seem
08:05imirkin: this is with a sm_60 target
08:07HdkR: Oh! I was thinking just CAS. Yes, I don't believe the other ops are there :D
08:07imirkin: well, XCHG is there. but that's not such a huge lift once you have CAS :)
08:07HdkR: Transforming the data when it is 64bit is hard
08:08imirkin: i mean ... they do it for ssbo's
08:08imirkin: (aka g memory)
08:08imirkin: but not for shared
08:08imirkin: why is shared harder?
08:08imirkin: the joke is that if i were to use the g one and try to access the shared memory window, it would also bail :)
08:09imirkin: although that's probably not wildly surprising
08:09imirkin: anyways ... loop it is.
08:09imirkin: we already do it for the other gens
08:10HdkR: uhh. That ends up as implementation detail where it is easier for global :P
08:10imirkin: yeah, i mean i'm sure there's some reason
08:10imirkin: but you can't just sit there and say "64-bit is hard"
08:11imirkin: they do plenty of difficult things
08:11imirkin: when there's a will, there's a way
08:11HdkR: Yes there is a reason :)
08:11imirkin: i bet the memcpy() op they added in ampere or whatever wasn't trivial either
08:12imirkin: but there was reason enough to do it, so they did
08:12HdkR: Of course
08:12imirkin: whereas here i'm guessing the use-case was ... shall we say ... lighter
18:55karolherbst: imirkin: mhhh.. so I have this silly situation, one thread triggers nouveau_fence_trigger_work, but another thread adds a work item to the same fence :/
18:56karolherbst: which I think should never happen, but... maybe that's "okay" and we should just let the one thread finish the work list and the other one just calls the callbacks?
18:56karolherbst: any opinions?
18:59karolherbst: actually.. not sure this is even the same fence ...
18:59karolherbst: ehh, wait, has to
19:00RSpliet: I'd be very wary of trying to allow things that shouldn't happen when it comes to parallelis. It usually causes more trouble than it's worth down the line.
19:00karolherbst: maybe it's a false positive of tsan actually
19:01karolherbst: can't get it to assert :/
19:01karolherbst: mhh, let's see
19:03imirkin: karolherbst: depending on locking, could def happen
19:54karolherbst: imirkin: I am actually wondering if this is a "valid" scenario and how to deal with it
20:00imirkin: the fence is per-screen
20:00imirkin: work gets added to fences left and right
20:02karolherbst: mhh, right
20:02RSpliet: eh? the whole point of a fence is to have a serialising effect no?
20:02imirkin: RSpliet: i mean the nouveau_fence object
20:03imirkin: basically it has a list of work items to be run once a fence is "completed"
20:03imirkin: and the "current" fence is generally not emitted yet, so it's just a placeholder
20:04RSpliet: Right. Okay, so the question is where is the cut-off point where you can't schedule new work-items for launch *after* the fence. That makes more sense in my mind
20:06karolherbst: but tsan also thinks that nouveau_fence_work and nouveau_fence_trigger_work can actually race on the work item
20:06karolherbst: it doesn't make much sense
20:07karolherbst: I think..
20:07karolherbst: at least it complains about a race when freeing it
20:08karolherbst: tsan is also quite annoying to me anyway
20:10imirkin: yes, i also prefer the "stick head into sand" approach
20:10imirkin: that way i don't get worry lines ;)
20:12imirkin: [fun fact - apparently ostriches do that to grab small stones to help their digestion]
20:12karolherbst: funny thing is, I try to get the application to assert on that race, but apparently I can't figure out what the condition to that really is
20:17karolherbst: well.. at least I don't see races on the pushbuffer stuff anymore :)
20:18emersion: hi, i'm trying to figure out why this happens https://gitlab.freedesktop.org/drm/nouveau/-/issues/36
20:18emersion: any ideas where to start?
20:19imirkin: GF114 - that's an important detail
20:19imirkin: ok, so
20:19imirkin: you see how it prints a bunch of stuff, like addr: val
20:19imirkin: or addr: val1 -> val2
20:19imirkin: what it's saying is that in the "submit", the value either stayed the same or changed
20:19imirkin: presumably the old values were fine, so something in the new values upset the hardware
20:19imirkin: wtf are all these things anyways
20:24imirkin: urgh. the code moved to using symbolic names instead of hex values
20:24imirkin: much harder to find stuff now.
20:30imirkin: ok, so like the IMAGE stuff is https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/dispnv50/base827c.c#L28
20:30Lyude: fwiw, nowadays I usually rely on clangd to find references to symbols being used (through the youcompleteme plugin on vim, you almost certainly can find clangd-compatible plugins for whatever editor you use). i haven't tested it with macros but i'd assume it probably works a lot better then trying to grep things
20:31imirkin: Lyude: well, i prefer to read code. makes it harder. wtvr.
20:31imirkin: the symbolic stuff is nice too
20:32imirkin: emersion: and the "top" stuff is https://nvidia.github.io/open-gpu-doc/classes/display/cl827c.h i think
20:33imirkin: so like 0xa0 is the notifier offset changing
20:33emersion: i see
20:33imirkin: and the two images are SURFACE_SET things on different heads
20:34imirkin: there are only 2 on GF114
20:34imirkin: also, don't make the foolish assumption that GF114 implements the GF110 display stuff
20:34imirkin: (also don't make the equally foolish assumption that GF110 implements the GF110 display stuff ... it's just a wildly bad name)
20:35Lyude: gf119 is the only one that implements it all the way iirc
20:35imirkin: only GF119 implements the "GF110" display :)
20:35imirkin: (since GF117 doesn't have display at all)
20:35emersion: so SET_OFFSET and SET_STORAGE on these
20:36emersion: hm how do you know it's two different heads/outputs? i would've guessed these were the cursor and primary planes
20:36imirkin: those aren't images
20:36imirkin: at least not in the GPU's terminology
20:36emersion: "Image 0" :P
20:37imirkin: https://nvidia.github.io/open-gpu-doc/classes/display/cl827d.h -- look at SET_CONTROL_CURSOR
20:37imirkin: and SET_OFFSET_CURSOR
20:37emersion: oh, you mean the cursor is not an image
20:38imirkin: and this guy to update the cursor location out of sync with the regular stuff: https://nvidia.github.io/open-gpu-doc/classes/display/cl827a.h
20:38imirkin: in general all updates are written to a command fifo, which is interpreted one command at a time
20:38imirkin: (and there's usually a "go" command)
20:38imirkin: so it's very atomic already :)
20:38imirkin: except designed in 2006
20:39emersion: this is using the legacy interface btw
20:39imirkin: doesn't matter
20:39imirkin: driver's atomic
20:39emersion: yeah but i'd expect only one output to be updated at a time
20:39emersion: not two together
20:39imirkin: they likely are backed by the same fb
20:40imirkin: and that fb got updated? dunno
20:40HdkR: Lyude: btw, if you're not using nvim yet, it really improves perf with YCM
20:40emersion: hm, no, i had only one connected connector
20:41Lyude: HdkR: I am!
20:41emersion: i'll grab some drm debug logs with the latest drm-tip commit
20:41Lyude: normal vim with ycm is border-line unusable
20:42HdkR: yep :|
20:42imirkin: emersion: maybe that's what's going wrong? dunno. i could also be wrong - could be the overlay image?
20:42emersion: no, wlroots doesn't use these yet
20:42imirkin: oh hm, yeah
20:42imirkin: i think it's actually HEAD_SET_PRESENT_CONTROL
20:42emersion: also, the bug doesn't happen when i disable modifiers
20:42imirkin: ah no, definitely not
20:43imirkin: yeah, i don't think scanout works with modifiers.
20:43imirkin: it looks like both are linear surfaces
20:43emersion: ugh, nouveau does advertise modifiers
20:43imirkin: maybe it works. dunno
20:43ccr: "modifiers! get your fresh modifiers here!"
20:43imirkin: here's what i can say
20:44imirkin: #define NV827C_SURFACE_SET_STORAGE_MEMORY_LAYOUT 20:20
20:44imirkin: #define NV827C_SURFACE_SET_STORAGE_MEMORY_LAYOUT_BLOCKLINEAR (0x00000000)
20:44imirkin: #define NV827C_SURFACE_SET_STORAGE_MEMORY_LAYOUT_PITCH (0x00000001)
20:44imirkin: (pitch = linear, blocklinear = tiled)
20:44imirkin: and that bit is set to 1 in both the old and new image
20:44Lyude: oh, that reminds me while we're on the topic of modifiers
20:44imirkin: so ...
20:44imirkin: something sounds off
20:44Lyude: cursor modifiers don't work and aren't supported in hw, will make sure to send the patch I've got sitting around for that to the ml today
20:44Lyude: just so no one tries that
20:45emersion: Lyude: got a link to the patch?
20:45Lyude: emersion: lemme link you to my igt wip kernel branch, it's got a couple of fixes
20:45emersion: because nouveau advertises non-linear modifiers for the cursor plane
20:45Lyude: yeah-my fix is to stop it from doing that :P
20:45emersion: excellent - i found it weird
20:46Lyude: i got the main stuff off my work plate today so i'll try to send most of this stuff out today
20:48emersion: thanks for the pointers imirkin
20:48imirkin: but -- i'm not 100% sure that modifiers work
20:48imirkin: they might work on some hardware but not other hardware
20:48imirkin: skeggsb would know for sure
20:49imirkin: but he's in eastern australia, so probably asleep
20:49Lyude: they do work
20:49Lyude: actually - kms_plane_formats is one of the igt tests I have passing with flying colors
20:49imirkin: Lyude: which hw?
20:50Lyude: imirkin: should be anything from ~kepler-pascal, note though I am aware of some weird issues with small overlay planes on kepler
20:50emersion: here's what i see with my card https://drmdb.emersion.fr/devices/1d62521d6f8c
20:50Lyude: but i don't think that's unique to modifiers
20:50imirkin: Lyude: emersion is on GF114
20:50Lyude: oh yeah i have no idea about that then
20:50imirkin: Lyude: yeah, i think i've seen problems with less than 32x32 or something
20:50Lyude: especially since I think they still used the legacy linear modifiers
20:50Lyude: imirkin: yeah that's it
20:51emersion: okay, so maybe i can try to prune modifiers until it works
20:51Lyude: (by legacy linear modifiers I'm referring to the 16X16 stuff)
20:51imirkin: i dunno what that is
20:51imirkin: is that just regular modifiers?
20:51imirkin: and there's some fancy new hotness?
20:52Lyude: imirkin: well i'm not super sure about the 16x16 stuff because none of my hardware supports it, i've just seen it mentioned in fourcc and seen it mentioned in nvidia docs. the "new" modifiers aren't actually that new, and should even work on fermi iirc
20:52imirkin: oh, it's just a different representation for the same thing, i think
20:53Lyude: the new ones specify the pagetype, block height, etc. etc.
20:53imirkin: i.e. as far as the hw is concerned, it's all the same
20:53Lyude: imirkin: i thought that was it but i wasn't sure because I saw the two mentioned side by side in some display docs
20:53imirkin: but now it's better represented in the modifier thing
20:54Lyude: emersion: btw-if you're very bored, no one has re'd the crc stuff for pre-gf119
20:55Lyude: it'll probably be harder because the state cache on earlier cards like that doesn't match up precisely with the method numbers, but hey if you're up for the challenge :P
20:56imirkin: yeah, getting them to cough up the docs for it was pretty nice :)
20:56imirkin: did you ask about the earlier classes?
20:56Lyude: imirkin: we've been asking about them for ages
20:56imirkin: yeah figured
20:56imirkin: "o well"
20:57Lyude: i have a weird feeling there's some big difference with how everything cl5xx[acde] was designed vs everything after, because it seems weird it would be so difficult for them to get those docs
20:57Lyude: but maybe they're just lazy, *shrugs*
20:58imirkin: yeah, i suspect the crc mechanism is totally different
20:58imirkin: and may involve non-evo
20:58Lyude: yeah......... i have some other reasons to suspect that as well
21:00Lyude: if we're lucky and it's not: be suspicious of state cache registers containing ffffff40 or ffffff00
21:00imirkin: not to mention the ides of march...
21:04Lyude: probably also worth checking if it's not the same CRC mechanism used on pre-evo GPUs
21:04imirkin: do we know what *that* is?
21:04Lyude: (i hope it's not, because then it's more or less useless :\)
21:05Lyude: imirkin: yes actually
21:05Lyude: i think it's in some docs we have
21:05imirkin: fun. didn't know that.
21:05Lyude: but it's really not useful, it's a lot like matrox (I suspect it might even be copied to or from there) CRCs where you have to select which bits from the RGB channels you want and you more or less can only look at one channel at a time
21:10Lyude:is quite curious how they ever did hw verification with this...
21:13imirkin: i dunno about you, but i'd just be super-happy my plug-in board didn't fry the motherboard when i plugged it in
21:14imirkin: hw is hard :)
21:18RSpliet: Lyude: RE kms_plane_formats: is flying colours a good thing? *drum roll*
21:19Lyude: no it's bad
21:19Lyude: it's pretty rad tbh
21:19Lyude: note i did have to fix a few bugs to get it there, which is where al ot of the commits on that branch came from :P
21:20RSpliet: Story of a sw engineer. There's always a handful of hidden unexpected bugs between where you stand and where you want to go
21:20RSpliet: But I was visualising shattered CRTs with colours flying all over the place
21:20Lyude: I will say though, I was pretty pleasantly surprised by how many things were not broken. I've got kms-pipe-color totally passing, and kms-cursor-crc is close (one test is failing because of some pretty bizarre transparency issues)
21:20imirkin: "wow, this is NOT broken?! what a surprise!"
21:21Lyude: yeah!! lol, you guys did a pretty great job manually verifying all of this stuff over the ages
21:21RSpliet: that's the treatment I give to most code that 1) I don't know yet, or 2) I've written myself
21:21imirkin: "what idiot wrote this broken bs?! .... o wait, that was me"
21:21RSpliet: Think display code is mostly on Ben :-)
21:21imirkin: nah, the stuff he writes usually works
21:22imirkin: (at least by the time he pushes it)
21:23imirkin: if i can fault him for one thing, it's probably perfectionism.
21:26RSpliet: Ironic :-D But can tell you first hand that perfectionism is a pain.
21:27RSpliet: Don't think I can count on two hands the times it's led to analysis paralysis, and resulting in no work getting done at all. And I can count in binary...
21:27imirkin: there's a pithy saying ... that i'm blanking on
21:27imirkin: perfect is the enemy of good
21:27imirkin: or something like that
21:27RSpliet: yeah I heard variations of that
21:28Lyude: RSpliet: good to know it's not just me lol
21:28RSpliet: Lyude: it's an entry requirement for the nouveau team :-P
21:28imirkin: dunno, i'm definitely a "good enough for gov't work" kinda guy
21:28imirkin: (good thing i don't work in gov't?)
21:29RSpliet: turns out driver devs get a special kind of evil for code that isn't perfect. Well, at least on the phoronix forums. Although arguably that's not a place to go in the first place, that's more on them than on the devs.
21:29imirkin: lol yeah, i ignore all that
21:29Lyude: are people on phoronix reviewing nouveau code
21:29imirkin: funny to read sometimes
21:29imirkin: lots of just vitriolic stuff too
21:29RSpliet: Lyude: there's always one grumpapuss who has to pour some vitriol whenever nouveau gets mentioned
21:30imirkin: e.g. https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-nvidia-linux-nouveau/1231743-open-source-nvidia-support-for-recent-gpus-is-poor-but-now-you-can-fake-it-for-testing
21:30imirkin: in response to my adding drm-shim support
21:30RSpliet: yeah, there's people who lift you up, and people who kick you down. Take-away message, don't be like that person, be in the first group. Better for your blood pressure :-P
21:30imirkin: (why he thought that was newsworthy in the first place? couldn't tell ya)
21:31RSpliet: Lyude: unless with code review you mean someone actually doing something useful instead of just bashing
21:31RSpliet: in which case absolutely not!
21:32Lyude: imirkin: you know that one time when I was first starting to work on mesa and you were helping me figure things out, I remember writing some extension for making it so that polygons vertices would be interpreted as squares. I mostly remember because phoronix wrote a news article on it that was like "WOW, what could Red Hat be planning with these extensions!? What magic future awaits??"
21:32Lyude: my dude, i'm just learning mesa, that's it
21:32Lyude: that was the one
21:34RSpliet: At times it's a bit like PAnon, but heck, it's the best we get when it comes to summarising events in OSS world
21:34imirkin: i guess the events just aren't that interesting
21:34imirkin: so he's gotta dig deep :)
21:36RSpliet: Exactly. He's a nice enough dude, just trying to make ends meet and write things up the best he can for a wider audience
21:38Lyude: yeah the phoronix guy is nice, can't fault him on that
21:40karolherbst: yes, just the commentors are trolls
21:40imirkin: yeah, michael seems like a nice enough guy. the stuff he says about nouveau is generally accurate
21:40karolherbst: but yeah, Michael is quite okay
21:40karolherbst: and he accepts if he is wrong and stuff
21:41karolherbst: not all journalists do this
21:43imirkin: emersion: just noticed this in your log
21:43imirkin: [ 503.000786] nouveau 0000:06:00.0: DRM: Moving pinned object 000000006921082c!
21:43imirkin: [ 503.092113] nouveau 0000:06:00.0: DRM: bo 000000006921082c pinned elsewhere: 0x00000001 vs 0x00000002
21:43imirkin: that's bad.
21:43emersion: how bad?
21:43imirkin: it generally means you're trying to do some sort of sharing
21:43imirkin: that doesn't work
21:43imirkin: e.g. a BO is pinned in vram
21:43imirkin: but you're trying to export it as a dma-buf, which means it has to sit in GART (aka sysmem)
21:44Lyude: imirkin: check drm-tip
21:44Lyude: there's a fix for that from me already
21:44imirkin: Lyude: can that be fixed?
21:44Lyude: (I can link you to it as well)
21:44imirkin: it's a logical impossibility
21:44emersion: imirkin: exporting as a dmabuf doesn't necessarily mean it will be shared
21:44Lyude: wait-oh, missed the "pinned elsewhere" bit. I never got that, but I did definitely fix an issue with ttm accidentally evicting pinned buffers that was breaking buffer eviction for nouveau
21:44imirkin: emersion: right, exporting as dma-buf isn't the action
21:45imirkin: emersion: probably yeah, importing it elsewhere
21:45emersion: it may be related to the cursor plane
21:45imirkin: anyways, it's not bad in that it won't cause crashes
21:45imirkin: but it'll just lead to black screens
21:45imirkin: and/or frozen contents
21:46emersion: wlroots allocates the cursor plane with USE_SCANOUT via GBM
21:46Lyude: https://cgit.freedesktop.org/drm-tip/commit/?id=70612d0e121e55ea3c057c526bf7374da41aa2f0 is the fix I'm talking about fwiw
21:46imirkin: i don't remember if scanout from system memory is possible
21:46karolherbst: imirkin: has to, no?
21:46imirkin: karolherbst: why?
21:46Lyude: iirc I don't think it is
21:46emersion: and then nouveau has to do some migration when it realizes it's used on the cursor plane
21:46Lyude: I asked skeggsb about this previously
21:46karolherbst: well, GPUs with stolen RAM also have to support it
21:46karolherbst: but I guess that's different?
21:46Lyude: you know there is a cap bit about this
21:47Lyude: that i never bothered looking at
21:47imirkin: karolherbst: GPUs with stolen RAM have a bit which is treated as "VRAM"
21:47imirkin: and you can only scan out from there
21:47emersion: Lyude: PREFER_SHADOW?
21:47karolherbst: sounds like cheating
21:47Lyude: but also, I don't know if that's relevant
21:47Lyude: emersion: no, it's uhhhh
21:47Lyude: hang on i can find a link to it
21:47emersion: anyways, wlroots doesn't scanout system memory
21:48emersion: there are just some migrations going on with cursor plane storage
21:48imirkin: emersion: i mean a "buffer object" which is located in system memory, vs vram
21:48imirkin: (modern) GPUs have VMs which abstract this internally
21:49emersion: GART → VRAM migration
21:49imirkin: so a particular GPU VA might refer to memory backed by vram or it's a pci-bus away
21:49imirkin: linear surfaces in vram should work totally fine
21:50karolherbst: actually for prime it would make sense if the GPU can scanout to whatever memory
21:50Lyude: yeah - i think if it does support scanout from gart it probably needs to be mapped to the pushbuffer
21:50imirkin: karolherbst: yeah, it would. but it's not allowed. this is why you end up with black screens.
21:50karolherbst: ehh :/
21:50karolherbst: I see
21:50Lyude: also - note those caps aren't indicative of anything if they aren't set to 1. I know there's some caps there that nvidia has confirmed never got implemented in any hardware
21:52imirkin: (because one part of the stack thinks that the buffer has moved. but it hasn't.)
21:55imirkin: karolherbst: in unrelated news - you good with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8461 as-is?
21:56karolherbst: there was one thing
21:56karolherbst: ahh no, I thought PIPE_CAP_GLSL_ZERO_INIT defaulted to something else
21:57karolherbst: and I think PIPE_CAP_SHADER_ATOMIC_INT64 is good at least for pascal as I am sure I did test int64 atomics in CL
21:57karolherbst: and made sure all test are pasing
21:57karolherbst: but not sure if there are patches missing
21:57karolherbst: imirkin: PIPE_CAP_SHADER_ATOMIC_INT64 is for buffers only, right?
21:57karolherbst: uhm.. ssbos
21:58imirkin: karolherbst: i have an impl already
21:58karolherbst: I know
21:58imirkin: it's in a different MR
21:58imirkin: need to make it work for shared
21:58imirkin: will do that tonight
21:58karolherbst: I think I have a patch for shared
21:58imirkin: it's not hard
21:58imirkin: oh, where?
21:58imirkin: (and why?)
21:59imirkin: basically i just need to write a simple loop with CAS. pretty easy.
21:59karolherbst: what do you mean with pointer?
21:59imirkin: pointer to the change
22:00karolherbst: ahh, right,
22:00karolherbst: imirkin: we also have PIPE_SHADER_CAP_INT64_ATOMICS....
22:00karolherbst: why not
22:00imirkin: the more the merrier
22:01karolherbst: mhhh, so I wrote a nir pass, but I also wrote a codegen version, which I am sure I have somewhere.. just must be an older branch
22:01imirkin: ok. well, don't worry too much about it
22:01imirkin: it's like 5 lines of code
22:01imirkin: happy to reuse what you did, but equally happy to not :)
22:01karolherbst: yeah.. well
22:01karolherbst: it's more than that
22:01karolherbst: because.. CFG and stuff
22:01emersion: can someone move this to drm/nouveau btw? https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/-/issues/480
22:01imirkin: karolherbst: yeah, but i can copy from the existing stuff
22:02imirkin: emersion: done
22:02karolherbst: mhh.. probably lost the commit or something
22:02karolherbst: oh well
22:03karolherbst: but I know I wrote that commit :)
22:03imirkin: not sure i'll get to it today anyways, but hopefully
22:03imirkin: it's the old new year today
22:04imirkin: [if you don't know what that means, you're better off that way]
22:04karolherbst: imirkin: btw, you'll need this: https://github.com/karolherbst/mesa/commit/c7cf024fb23a8f824ab304e35090206303e0e141
22:04karolherbst: ehh wait
22:04karolherbst: no, you won't :D
22:04imirkin: CAS and XCHG actually work fine
22:04imirkin: everything else bails
22:05imirkin: (coz the instructions aren't there, even tough nvdisasm is fine with it. but ptxas doesn't use them, so ... probably the hardware is right in the end)
22:06karolherbst: yeah, it all works just fine
22:07karolherbst: imirkin: but how does ptxas not use them?
22:07karolherbst: what does it instead?
22:09imirkin: i mean
22:09imirkin: ptxas uses cas/xchg
22:09imirkin: rather than e.g. atom min
22:09imirkin: even though nvdisasm is happy with ATOM.MIN
22:09imirkin: er, ATOMS.64.MIN
22:09imirkin: or whatever
22:10karolherbst: mhhh, interesting
22:10karolherbst: ohh well, right
22:11karolherbst: shared memory doesn't support it
22:11karolherbst: so you have to lower all variants except cas/xchg
22:11karolherbst: only global has 64 bit aotmics
22:11imirkin: /*0130*/ ATOMS.OR.U64 R0, [0x10], R2;
22:11karolherbst: on the hw shared int 64 atomics just fail
22:11karolherbst: I already tried
22:12imirkin: yeah, you get illegal instr encodings
22:12imirkin: i tried too =]
22:15imirkin: but nvdisasm is happy with those. oh well.
22:16imirkin: i already complained about this yesterday :)
22:18imirkin: karolherbst: so ... good with the caps update?
22:18imirkin: cool thanks