01:59karolherbst: imirkin: are you up for pain? highly optimized volta code: https://gist.github.com/karolherbst/60335c32993c706751f4361e2013fef2
01:59karolherbst: do you like it?
02:01karolherbst: I am not kidding, that was compiled with O3
02:01karolherbst: and the source has 3 shifts, 3 adds and one mad :)
02:05HdkR: karolherbst: Hm?
02:05imirkin: IMAD.U32.X R2, R2, c[0x0][0x168], R0, P1
02:05imirkin: wonder what that does.
02:05karolherbst: it is in predicates now
02:05imirkin: from the predicate?
02:05karolherbst: so we have two with iadd
02:05karolherbst: two in and two out
02:06karolherbst: I think
02:06karolherbst: allthough only one is shown? weird
02:06imirkin: gotta love the mov: IMAD.U32 R3, RZ, RZ, c[0x0][0x16c];
02:06karolherbst: but those RZs...
02:06karolherbst: they do that a lot
02:06skeggsb: they interleave both normal MOV and those IMAD-MOVs
02:06HdkR: heh, those ones are fun :P
02:06imirkin: this is where it helps to have a model of the internal hw block availability
02:07skeggsb: imirkin: btw, what does iset do with carry? need to figure out how to make volta do that :P
02:07karolherbst: imirkin: I love how they prefer imad over shl :)
02:07karolherbst: because, why not?
02:07imirkin: skeggsb: do we ever use that?
02:07imirkin: oh right... i think we do somewhere ...
02:07imirkin: hold up
02:08skeggsb: 49: add u32 $c0 $r10 neg $r0 (16)
02:08skeggsb: 50: set u8 $p0 neu u32 $r11 $r1 $c0 (16)
02:08skeggsb: apparently :P
02:08imirkin: yeah. i have a comment somewhere.
02:08imirkin: just have to find where :)
02:08karolherbst: and apperantly every shader starts with a NOP
02:08karolherbst: skeggsb: anyidea why that is?
02:09skeggsb: no, i've noticed it too, but it's not all shaders either
02:09karolherbst: skeggsb: maybe movs r c have all 0x20 aligned addresses?
02:10karolherbst: but uhm
02:10karolherbst: that doesn't make much sense either
02:10imirkin: skeggsb: hmmmm ... i'm only seeing it in nv50 lowering
02:10imirkin: skeggsb: can you provide more info, like the full source shader + output?
02:10imirkin: i.e. the tgsi
02:11imirkin: on nv50, it's in handleSLCT
02:11imirkin: oh wait. crap. i was grepping for the wrong thing
02:11imirkin: theeere it is
02:12imirkin: ok. so if you have a 64-bit compare
02:12imirkin: of, say, unsigned 64-bit integers
02:12imirkin: you can split it up into 2 pieces
02:12imirkin: compare the high, and then compare the low
02:12imirkin: if the high is different, it doesn't matter what the low is
02:13HdkR: How far along is Volta at this point? Would be fun to see my TV running Nouveau
02:13imirkin: so ... we subtract the high words
02:13HdkR: Shadergen being the stopper atm or something else?
02:13imirkin: and use the carry bit of that
02:14imirkin: to feed into the comparison of the low words
02:14imirkin: skeggsb: might be faster to look at NVC0LegalizeSSA::handleSET :)
02:14imirkin: [note that it only gets called for 64-bit int types]
02:14skeggsb: imirkin: thank you :) that gives me a good start
02:14karolherbst: ohh I remember something like that
02:14skeggsb: and yeah, it's coming from arb_gpu_shader_int64 tests
02:15skeggsb: HdkR: codegen really, i'm *nearly* there.. some show-stopping issues remaining though
02:15skeggsb: HdkR: and nvidia to publically release firmware
02:15imirkin: you don't really have to replicate this behavior - you just have to deal with 64-bit compares
02:15HdkR: skeggsb: Currently you're just stripping from the blob I presume?
02:15imirkin: skeggsb: my scanner's good enough, right?
02:16HdkR: Stripping firmwares*
02:16imirkin: oh, probably not actually
02:16imirkin: it gets the netlist and whatnot, but not the various pmu & co firmware (in a detectable manner)
02:16imirkin: HdkR: your TV has a volta in it?
02:16skeggsb: imirkin: it'll miss the signatures too, i believe
02:17imirkin: do you even need those... :p
02:17skeggsb: HdkR: let's just say that i have them, but people will have to wait for nvidia :P
02:17HdkR: imirkin: hah. I don't think the Titan V would fit in my TV ;)
02:18imirkin: HdkR: maybe the other way 'round?
02:18HdkR: Has a firmware even made it out since Alexandre left?
02:18imirkin: trying to remember if GP108 was after he left or not
02:18imirkin: i think it was.
02:19skeggsb: yeah, it's only gp108 so far though
02:19imirkin: took forever.
02:19karolherbst: some stuff, yes
02:19karolherbst: skeggsb: wasn't there even some fixes as well?
02:19karolherbst: or was that still when alex was there
02:19skeggsb: that was while alex was there, gp107 was messed up for the initial release iirc
02:21HdkR: skeggsb: Simple shaders seem to be working? Or is it all shaders failing atm?
02:21skeggsb: i've got the majority of piglit shaders running
02:21karolherbst: skeggsb: well I guess you could post patches to the ML before firmware is released or something
02:22skeggsb: yeah, there'll be nothing blocking me releasing patches once they're done, even without fw
02:22skeggsb: i'll probably need imirkin to smack me around a bit before they're mergeable :P
02:22karolherbst: well with all the volta stuff going on I wuold actually prefer that we take a deeper look, because things can always break in a subtle manner :(
02:22karolherbst: yeah :D
02:22karolherbst: I try to help him with that :p
02:22skeggsb: hehe, thanks
02:23karolherbst: allthough I would be actually able to test your patches :p
02:28karolherbst: uhm, maybe I should fix the arb_compute_variable_group_size stuff with nir...
02:37HdkR: skeggsb: Anything bewildering that is breaking or is it mainly just uninvestigated breakages?
02:39HdkR:is excited about it :)
03:57skeggsb: HdkR: there's something odd with texturing i haven't resolved is the main one, concentrating on getting codegen stuff as correct as possible first.. had a couple of failed attempts at the texturing thing so far
03:59imirkin: skeggsb: what's the issue?
03:59skeggsb: imirkin: unknown at this point, it's *definitely* not in codegen though
03:59skeggsb: even used nvidia's shaders :P
03:59imirkin: yeah, i'm sure it's not
04:00imirkin: esp if you're not doing anything funny
04:00imirkin: like textureGrad on a 2d array shadow texture
04:01HdkR: Texturing would be useful yes
04:01skeggsb: imirkin: even weird behaviour with simple stuff, yeah
04:02skeggsb: can't say much intelligent about it yet, as i haven't really nailed down where it's going wrong so far :P
04:03skeggsb: tic/tsc formats look compatible, if not the same. i *think* the handle mapping stuff is the same too, but not 100% sure
04:04skeggsb: i even suspected perhaps non-linked tsc stuff didn't work on volta, but hacked us to linked mode and no change
04:05skeggsb: i'll get back to that at some point, i switch between the two depending how much codegen is annoying me
04:05HdkR: Does it end up looking like corrupt textures or something dumb like all black?
04:05skeggsb: hard to say with piglit :P
04:06imirkin: debugging this stuff is not an exact science
04:06HdkR: lol yep.
04:49rhyskidd: does nv hardware verify the *entirety* of a vbios matches the cryptographic hash?
04:53rhyskidd: curious about this local exe nvidia released as their DisplayPort firmware updater
04:53rhyskidd: apparently dumps the vbios, overwrites the UEFI GOP table, does any fix ups, and then re-uploads the modified VBIOS
04:54rhyskidd: which all happens locally -- e.g. if the overwritten UEFI section is part of the data cryptographically hashed, the tool needs the private key ...
04:55rhyskidd: quick scan of the binary doesn't show any "BEGIN RSA PRIVATE KEY" though
12:21karolherbst: imirkin: I triggered that bug again, where GlobalCSE wants to insert a phi instruction after a join ;(
12:22karolherbst: imirkin: do you maybe know what the reason was to insert Instructions before a OP_JOIN inside GlobalCSE?
12:24karolherbst: mhh, seems like that code was already there like 6 years ago
12:29karolherbst: allthough no, we can't do that
12:29karolherbst: if a phi has two sources and both are phis
12:29karolherbst: those can't be considered equal even if both of their sources are the same
12:29karolherbst: imirkin: would you agree on that one?
12:29karolherbst: (well only exception would be both phis are inside the same bb)
13:16pendingchaos: karolherbst: think you could give R-b(s) for some of these small cleanup/bugfix patches sometime: https://gist.github.com/pendingchaos/ab6e41bdf80239540f8ca094558d91fb ?
13:17karolherbst: pendingchaos: do you have a test for the first patch?
13:17pendingchaos: no, I could probably try to see if I can make one though
13:19karolherbst: okay, that code won't work
13:20karolherbst: pendingchaos: you found this inside LoadPropagation, right?
13:20karolherbst: check the OP_SUB case ;)
13:20karolherbst: ohh allthough
13:20karolherbst: there the negative value is checked
13:22karolherbst: pendingchaos: "TargetNVC0::insnCanLoadOffset()" is r-by me
13:22karolherbst: but please change the commit message a bit
13:22karolherbst: and mention that the nv50 code already does this
13:23pendingchaos: how should I change the commit message other than mentioning that the nv50 code already does it?
13:25karolherbst: uhm, I only meant the nv50 part
13:32pendingchaos: can you push it for me if the new commit message looks good: https://gist.githubusercontent.com/pendingchaos/8652fad604711f470a0ce8fe4ccf3ba5/raw/d9ab6f95c4c5e8a01c1107b76b5fbc7014b89206/nv50-ir-fix-TargetNVC0-insnCanLoadOffset.patch ?
13:38karolherbst: yeah. I am currently bisecting something, so I will do it later today if I don't forget :)
13:39karolherbst: pendingchaos: is NVC0_CB_AUX_TEX_INFO defined in a magic way? I somehow couldn't find the definition via a simple search
13:39karolherbst: allthough, that should be there, I am sure
13:39pendingchaos: it's in nvc0_context.h
13:39pendingchaos: line 118
13:40karolherbst: mhh, I guess I messed up my grep then
13:42karolherbst: yeah, that patch is also r-by me
13:43karolherbst: I will try to take a deeper look at the other two patches after I am done bisecting stuff here
16:15karolherbst: pendingchaos: I will remove the "fixes" comment from the one patch, because it doesn't actually fix anything. Fixes is usually used in commits to point to faulty commits adding some kind of bug, which get fixed by the new commit
16:16karolherbst: mainly helpful for stable tree maintainer or to add more value to your commit message
16:16pendingchaos: sounds fine
16:18karolherbst: pendingchaos: I will add a fixes to your insnCanLoadOffset patch though, because there it actually make sense :) and I am sure that is broken since forever
16:19karolherbst: (not that I think it triggers in any real world application, but...)
16:20karolherbst: wow weird
16:20karolherbst: pendingchaos: https://cgit.freedesktop.org/mesa/mesa/commit/?id=37b67db6ae34fb6586d640a7a1b6232f091dd812
16:20karolherbst: that method was actually added later
16:20karolherbst: than the nv50 version
16:21karolherbst: I was already wondering why they differ