00:32 imirkin: PaulePanter: it looks like the problem, btw, is that your AML has no concept of how long the underlying ROM really is
00:33 imirkin: PaulePanter: i dunno if your fix makes sense... but the existing _ROM code looks pretty fubar too
00:37 imirkin: PaulePanter: here's an example of a "real" one https://bugs.freedesktop.org/show_bug.cgi?id=99372#c6
00:37 imirkin: not that i'm saying it's the definition of perfection
00:37 imirkin: in fact the ones that end up in bugs generally aren't ... :)
00:38 imirkin: and another one: https://bugs.freedesktop.org/show_bug.cgi?id=93778#c1
13:02 captainchris: hi everybody
13:11 captainchris: can i use DRI with nouveau driver ?
23:10 pendingchaos: karolherbst: this thread about XMAD seems interesting: https://devtalk.nvidia.com/default/topic/980740/cuda-programming-and-performance/xmad-meaning/
23:15 karolherbst: pendingchaos: yeah
23:15 karolherbst: I already read it :)
23:15 HdkR: Woo xmad? :)
23:16 karolherbst: pendingchaos: xmad is really just a 16 bit alu thing with a 32 bit add I think
23:16 karolherbst: but maybe the 32 bit add isn't even given in all cases
23:20 karolherbst: pendingchaos: feel free to add support for them after you figure them out fully :p
23:23 karolherbst: HdkR: this xmad instruction is total crazy though
23:23 karolherbst: it is even faster doing 3 XMADs than one IMUL
23:23 HdkR: Yes
23:23 HdkR: Yes it is :)
23:23 karolherbst: for short it only needs one XMAD :)
23:23 karolherbst: XMAD.S16.S16 R0, R3, R0, RZ;
23:23 HdkR: aye
23:24 karolherbst: but BFE for sign extension
23:24 karolherbst: on both sources
23:24 karolherbst: it is gone with volta though
23:25 HdkR: Hm? You don't need BFE on the sources
23:25 karolherbst: yes
23:25 karolherbst: because the hw is stupid
23:26 karolherbst: maybe an unsigned BFE would be enough.. but meh
23:26 HdkR: The encoding gives you zero or sign extension per source and complex selection for the addition reg
23:27 karolherbst: BFE?
23:27 karolherbst: well right, because you specify if it's signed or unsigned
23:27 karolherbst: for ushorts doing an AND 0xffff is enough
23:27 karolherbst: but not for shorts
23:27 karolherbst: there you need you sign extended BFE ;)
23:28 HdkR: That shouldn't be the case :|
23:28 karolherbst: maybe the nvidia compiler is stupid
23:28 karolherbst: who knows
23:28 karolherbst: but
23:28 karolherbst: from my tests I needd that BFE for implementing shorts and chars
23:29 HdkR: chars yes
23:29 karolherbst: shorts as well
23:29 karolherbst: the reg is still 32 bit
23:29 HdkR: Sure
23:29 karolherbst: and you don't have real 16 bit alu
23:29 karolherbst: it is all fake
23:29 HdkR: Of course
23:30 karolherbst: so you need to sign extend
23:30 karolherbst: for some stuff it just doesnt matter like an add
23:30 HdkR: But the instruction already gives you sign extension on the multiply sources just not on the add :P
23:30 karolherbst: uhm well
23:30 karolherbst: for add it still matters
23:31 karolherbst: HdkR: do you know it for sure?
23:31 HdkR: You only get zext on the adder bit
23:31 HdkR: aye
23:32 karolherbst: then either nvidia is lazy or you are wrong :p
23:32 HdkR: I'll go for lazy :D
23:33 karolherbst: :D of course you are
23:33 HdkR: Oh snap
23:33 karolherbst: I guess they don't optimize that well for 16/8 bit cases
23:34 karolherbst: anyway, this is all part of the fun figuring out what XMAD really does
23:34 HdkR: 32bit is the most common anyway
23:34 karolherbst: yeah
23:34 HdkR: Which is all Dolphin really cares about ;)