01:23mew007: hello all
01:23mew007: May I talk to Dr. Martin Peres?
01:24mew007: skeggsb_: hello
02:14mew007: are you alive? or you all hangovered to smithereens?
02:15mew007: ducking alcoholics...
02:52mew007: ok. I'll be back when you are sober
04:15RSpliet: imirkin: do you happen to know of a way to generate 4-byte shader instructions on nv50 easily?
04:15RSpliet: (calim: ^)
06:07pmoreau: imirkin: For which Tesla cards do you want piglit results?
09:43imirkin: RSpliet: nv50_ir_emit_nv50.cpp should auto-emit them... not sure what your question is...
10:17RSpliet: imirkin: I want to hand craft them for testing purposes
10:17imirkin: use envyas?
10:17RSpliet: and then upload them?
10:18imirkin: no clue -- what do you mean by "hand craft"? and what sort of testing purposes?
10:18RSpliet: thing is, I found the sat modifier on the 8-byte mul opcode for NV50, and wanted to see if I can find it for the 4-byte opcodes as well
10:18imirkin: ok... so find a piglit with a shader that would trigger that optimization
10:18RSpliet: the only way to test, is to have a shader that generates a 4-byte opcode
10:18imirkin: and then modify the nv50_ir_emit_nv50.cpp code to generate 4-byte opcodes for that situation
10:19RSpliet: yeah, it sounded easier to create a shader rather than look for it
10:19RSpliet: mmhmm, but somehow if the scheduler decides that there's no dual issue possible, it'll generate 8-byte opcodes anyway
10:19RSpliet: or so it seems
10:20imirkin: dual issue?
10:20imirkin: that's not a thing on nv50 iirc
10:21imirkin: can you pastebin the tgsi?
10:22imirkin: note that if one of the args is an immediate, that forces encsize to 8
10:22imirkin: or if it has an exit modifier
10:23RSpliet: I don't have TGSI right now, but my attempts probably failed because I used immediates
12:35pmoreau: imirkin: max-texture-size caused a TRAP_M2MF (trapped read: reason PAGE_NOT_PRESENT) on my G80 it seems.
12:35pmoreau: I'll restart the computer and try it again to confirm
12:37imirkin: yeah, max-texture-size can cause some issues sometimes too
12:39pmoreau: Apparently it's GL_TEXTURE_2D, internal format = GL_RGBA32F
12:39pmoreau: Ok, I'll remove that test from the run
13:32pmoreau: imirkin: Results sent (G80 + G86)
13:33imirkin: pmoreau: received, thanks!
13:33imirkin: hopefully these don't have the texelfetch errors
13:34pmoreau: Let's hope it doesn't
13:35imirkin: still does
13:35imirkin: i dunno wtf you're doing
13:36imirkin: but you consistently get texelFetch to fail
13:36imirkin: [ 5560.852179] systemd-journald: /dev/kmsg buffer overrun, some messages lost.
13:36imirkin: welcome to systemd
13:36imirkin: if you can somehow destroy it, that'd be great
13:36imirkin: /dev/kmsg buffer overrun, so let me just log something to kmsg real quick...
13:36pmoreau: Meh… Is there some special configuration needed for texelFetch?
13:36imirkin: "Keyboard Error. Press F1 to continue."
13:37imirkin: i have no idea how you're achieving the texelFetch fails
13:37pmoreau: Eh eh! I didn't believe the keyboard error message was real, until I experienced it
13:37imirkin: and this is presumably not on your macbook
13:37imirkin: yeah, it's easy to get if you mash the keys during POST
13:37pmoreau: Right, it is on my desktop computer
13:38tibbs: Just boot with log_buf_len cranked way up.
13:38imirkin: all the dmesg-warn tests are just full of that junk
13:38pmoreau: I probably forgot to modify the config file on the desktop; I have it at max (21) on my laptop
13:38imirkin: like 1000x that buffer overrun message
13:39pmoreau: I should probably remove Nouveau debug messages too :D
13:39imirkin: the doubly-odd thing is that it only happens with texelFetchOffset
13:39imirkin: nah, those get logged at debug
13:39imirkin: so piglit doesn't pick them up
13:39imirkin: but if you could somehow destroy systemd-journald, that'd be great
13:39RSpliet: pmoreau: but you can just hot-plug a PS/2 keyboard and often it works just fine
13:40pmoreau: RSpliet: I don't have any of those :/
13:40RSpliet: maybe USB too
13:40RSpliet: I only have PS/2 and Bluetooth keyboards
13:40RSpliet: anyway, off-topic, sry
13:41imirkin: RSpliet: btw, if fmul short encoding has a sat modifier, most likely that fmul long encoding does too
13:41imirkin: RSpliet: you can play with it with nvdisasm... sort of. it's really pretty shitty for SM10
13:45RSpliet: imirkin: I'm certain the long encoding has it
13:46RSpliet: in fact, I would've been surprised if it didn't, because most likely fmul, fadd and fmad all use the same multiply-addition hardware unit
13:47imirkin: RSpliet: agreed... but the emit code doesn't handle it. double-check envydis/nv50.c -- the modifiers may already be there
13:47RSpliet: imirkin: I already added the emit bit for the long encoding :)
13:47imirkin: RSpliet: it was recently pointed out that the neg modifiers were missing on min/max
13:47RSpliet: well, in mesa that is
13:47RSpliet: it's sitting in my git-tree
13:47RSpliet: but as long as I don't find the encoding for the IMM format, can't push that patch
13:48RSpliet: not sure how to force an IMM though, the nv50 compiler seems to prefer using a ld from c (if memory serves me right)
13:50imirkin: can you provide examples? i'm not sure what you're talking about
13:50imirkin: IMM is immediate... c is constbuf...
13:50RSpliet: I know
13:50imirkin: they are diff register files and highly non-interchangeable
13:50imirkin: the nv50 compiler starts by putting all immediates into their own registers
13:51imirkin: and then copy-prop will move them directly into instructions if the instructions allow it
13:51RSpliet: highly non-interchangable on the ISA level, but functionally I guess it's a "decision"
13:51imirkin: if you run with NV50_PROG_OPTIMIZE=0 that should disable all that
13:51imirkin: eh. not a decision that nouveau/codegen makes
13:51imirkin: it has no ability to "modify" the consts
13:52imirkin: a theoretical compiler might, but not this one
13:52RSpliet:takes mental notes
13:52RSpliet: my head is in different material at the moment, so sorry for sounding a bit incoherent every now and then
13:53RSpliet: (although there seems to be understanding :))
13:54imirkin: no worries. this stuff took me a while to work out. i think i have a decent handle on it now though.
13:55RSpliet: sure, I have to give credit to calim for writing this thing... it's mostly well readable once you add "peephole" to the dictionary with a description of "that place where optimisations are done" :D
13:56imirkin: yeah, well peephole opts are a special type of optimizations
13:56imirkin: however it seems to have become the repository of all the opts
13:56RSpliet:wonders where that name came from
13:57imirkin: a peephole is a hole in a door so you can see who's ringing the bell
13:57imirkin: (in regular english)
13:57tibbs: Back when I took compiler theory, I was told the name arose because you look through a very small set of instructions to find sequences to optimize.
13:57imirkin: i think the idea is that it's optimizations that are done based on very little information in a greedy manner
13:57imirkin: or something like that
13:59RSpliet: right right
14:00imirkin: pmoreau: ok... so what insanity are you *consistently* doing to get texelFetchOffset to fail
14:00imirkin: pmoreau: something that's both on your desktop and laptop
14:00pmoreau: imirkin: Launching piglit I gues
14:01imirkin: but no one else's
14:01imirkin: funny compositor?
14:01imirkin: do you secretly go in and modify the code to ignore offsets?
14:01pmoreau: I don't think I have a compositor running, just using Awesome
14:02imirkin: and awesome is a tiling wm?
14:02imirkin: which will futz with window sizes
14:02imirkin: can you run
14:02imirkin: bin/texelFetch offset 140 fs sampler2DRect -auto -fbo
14:02imirkin: oh, but it's fbo, and auto... so no window...
14:03pmoreau: Then which command?
14:03imirkin: no, still run that
14:03imirkin: let me know if it fails
14:03imirkin: if it doesn't then i'll be _really_ surprised
14:06pmoreau: It fails due to some syntaxic errors
14:14pmoreau: imirkin: Ok, it does fail. :)
14:14imirkin: pmoreau: can you run it without -auto -fbo
14:14imirkin: and take a screenshot
14:14pmoreau: Probably :D
14:19pmoreau: imirkin: http://i.imgur.com/LxxR4Cv.jpg
14:20imirkin: oh, that's ~what it's supposed to look like
14:21imirkin: could you take an actual screenshot though? e.g. use xwd?
14:21imirkin: wait, WTF!
14:21imirkin: it fails for me too now
14:21imirkin: it works with my build
14:22imirkin: but not with my system 10.3.5 build
14:22pmoreau: So, what are you running that others don't? :p
14:23imirkin: well i'm also on nvc0
14:25imirkin: let me try to futz with build options
14:27pmoreau: (I used the one from the "Install Nouveau" page from the Wiki for Mesa)
14:28pmoreau: Btw, there are some options that are listed twice, is it something necessary?
14:35imirkin: well THATS annoying
14:35imirkin: --enable-debug appears to be the flag that fixes it
14:35imirkin: that's very annoying for... you know... debugging
14:37pmoreau: I happened to fix some segfault errors in some library by adding the -g flag to the compile line
14:38pmoreau: Or sometimes the program starts working when you launch it with gdb --'
14:39pmoreau: skeggsb_: ping http://lists.freedesktop.org/archives/nouveau/2014-December/019380.html
14:43RSpliet: imirkin: interesting!
14:43RSpliet: 0: MUL TEMP.x, IMM.xxxx, CONST.xxxx
14:43RSpliet: turns into
14:44RSpliet: EMIT: mov u32 $r0 0x40400000 (8)
14:44RSpliet: EMIT: mul f32 $r0 $r0 c0[0x0] (8)
14:44imirkin: why is that surprising?
14:44imirkin: i don't think you can have a const reference on the first src
14:44imirkin: (check the target though...)
14:45RSpliet: I'm thinking semi-out-loud right now
14:46imirkin: ok. WTF
14:46imirkin: in the working case:
14:46imirkin: 00000018: 1c00dde2 1800001e mov b32 $r3 0x787
14:46imirkin: in the failing case
14:46imirkin: 00000010: 0000dde2 18000000 mov b32 $r3 0x0
14:46imirkin: i must have messed something up BIGTIME when i added ARB_gs5 support...
14:47tobijk: ouch :/
15:06imirkin: found it.
15:06imirkin: that was annoying.
15:08tobijk: that was fast :>
15:15imirkin: pmoreau: http://lists.freedesktop.org/archives/nouveau/2015-January/019636.html
15:22RSpliet: imirkin: thanks for the hints, got all those test cases figured out :)
15:23imirkin: RSpliet: note that it will also try to swap the order of mul (and add/etc) arguments to make it possible to put a constbuf/immediate into the second src
15:24RSpliet: imirkin: I'd expected it to yes :)
15:25imirkin: RSpliet: can you add the relevant changes to envydis/g80.c ?
15:25imirkin: that file is ~impossible to understand first time 'round, so let me know if you have questions
15:25RSpliet: I'll erm... take a look to see if I understand the format :)
15:25imirkin: lots of macro magic
15:25imirkin: lots of cleverness
15:26imirkin: makes a lot more sense once you "get" it
15:26RSpliet: oh, I did only test this on NVA8
15:27RSpliet: which, admittedly, is bad practice in a lot of cases
15:27imirkin: it's unlikely something like that would have been added at that late stage
15:27imirkin: but perhaps get pmoreau to check it out on the G80
15:27RSpliet: I reckoned, but still :)
15:29imirkin: how sure are you about the 20 for the encsize == 8?
15:29RSpliet: really sure
15:32imirkin: RSpliet: as long as you're fixing things, the modifiers on OP_ADD should prolly be the same as on OP_SUB...
15:34RSpliet: well that sounds somewhat sensible... I'll keep that in the back of my head
15:35imirkin: just missing the sat enable
15:36RSpliet: will it emit an ADD with negative SRC1?
15:37RSpliet: it seems to :)
15:38imirkin: yeah... it flips the neg modifier on src1 at emit time
15:39imirkin: RSpliet, pmoreau: can one of you test out my patch http://lists.freedesktop.org/archives/nouveau/2015-January/019636.html to make sure it fixes things on nv50 (e.g. bin/texelFetch offset 140 fs sampler2DRect -fbo -auto) for release builds
15:41imirkin: RSpliet: btw, if you want a non-compiler related thing to fix on that nva8, there's some transform feedback stuff (like drawing from a saved tfb, probably others)
15:42RSpliet: imirkin: but the compiler is that one thing I _do_ understand sort of :p
15:43imirkin: RSpliet: hehe, well... that was one of the areas i started in too
16:04RSpliet: imirkin: you can add yours truly as Tested-by ;-)
16:07imirkin: RSpliet: for the offsets thing? thanks
16:10imirkin: npo being "np" or "nope"?
16:12RSpliet: heh, No PrOblem
16:14RSpliet: apparently also means Natalie Portman Obsessives...
22:14imirkin: RSpliet and pmoreau: if either of you get a chance, give this a shot on nva0+ (nvac/nva8 both qualify) -- https://github.com/imirkin/mesa/commit/c01bb891221e3cd3e030fc33042fd32a80a19cc4 -- and run bin/arb_transform_feedback2-* that'd be cool.
22:16imirkin: let me know if there's any change compared to baseline
23:29pmoreau: imirkin: I'll give them all a try this evening. :)