00:00imirkin: RSpliet: for completeness, sure
00:00RSpliet: Ugh, I get sucked into these things too easily
00:00imirkin: RSpliet: but this is like 10 levels up from what we're dealing with here
00:02RSpliet: imirkin: if nouveau rejects the 10bpc mode offhand, that would lead to selecting a mode with presumably a lower clock. Not sure if that'd solve the problem (might still be too high), but unless we know a hard cap on the clock, wouldn't that be worth sth?
00:02imirkin: RSpliet: the mode isn't 10bpc
00:02imirkin: the modeline doesn't have a concept of bpc-ness
00:03imirkin: the problem here appears to be that high-ish clock values are rejected due to us doubling them (and then dividing by 2)
00:03imirkin: as well as supplying the pixel depth at all, which isn't supported on older classes
00:07RSpliet: Okay, but the pixel depth it's trying to set is '6' which is PIXEL_DEPTH_BPP_30_444 . Does that refer to a different concept (e.g. an input buffer, rather than an output pixel stream)?
00:11imirkin: it's the output
00:12imirkin: and has to be coordinated with other things, like the infoframe
00:12imirkin: which i'm not sure is supported with PIOR
00:31shadow:compiling kernel
00:42kherbst: imirkin: btw.. did you find some time to verify that my RA fix fixes that firefox crash? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5277?commit_id=69acccd2cb8315114333d33da44a0df900e9c874
00:42kherbst: ehh.. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5277
00:42imirkin: haven't tried
00:43kherbst: would be cool to know
00:43kherbst: I mean.. I verified it locally with the shader.. just wondering if it also works with firefox or if other random shit happens
01:50shadow:still compiling kernel
01:58imirkin: kherbst: building
01:59imirkin: kherbst: what was the url again? like store.google.com?
02:02imirkin: kherbst: no more crash
02:02imirkin: and no obvious misrendering
02:03imirkin: and it still crashes with mesa-20.1, so it's not like they changed the page
03:07shadow: well. kernel compile is done. here we go.
03:15shadow: imirkin: it for sure behaves differently.
03:22shadow: imirkin: briefly I saw a very blue image on the fancy monitor. It's progress I suppose :)
03:24imirkin: shadow: can you again boot with drm.debug=0x1e nouveau.debug=disp=trace and provide the log?
03:25imirkin: skeggsb: memset(asyw->image.handle, 0x00, sizeof(asyw->image.handle)); -- that can't be right
03:26shadow: imirkin: will do. the system isn't locked up being busy at least. do you have any particular action (or not) I should try?
03:26imirkin: skeggsb: should that be sizeof(*foo.handle)?
03:26imirkin: shadow: run dmesg
03:26shadow:brb
03:26imirkin: shadow: but with those flags
03:27skeggsb: imirkin: no, i wanted to memset the entire array
03:27skeggsb: not that we use it atm
03:27imirkin: skeggsb: oh, it's an explicit array? heh, ok.
03:29skeggsb: u32 handle[6];
03:29skeggsb: yup
03:31skeggsb: for planar + stereo etc
03:32imirkin: delightful
03:35skeggsb: i tried for planar formats when i did volta bring-up, but hw apparently didn't support it yet, despite the class being re-jigged to allow it
03:35skeggsb: dunno if turing fixed that or not, haven't had time to dig in, maybe Lyude will get to it when she works on some improvements there
03:36shadow: imirkin: done. https://gist.github.com/eshattow/452df2c275426adb61bbcea4a434f618
03:37shadow: imirkin: I plugged in the fancy monitor DP and then used gnome display settings a few different ways until at last the very-blue output appeared on the fancy monitor, before dmesg.
03:37imirkin: shadow: hm, that's not from the start...
03:37shadow: imirkin: ring buffer is short
03:37shadow: hang on
03:37imirkin: i'm most interested in what happens right after you plug it in and it does the first modeset
03:38imirkin: shadow: also did you try setting e.g. 1920x1080@60 and see if that Just Works ?
03:38shadow: it doesn't seem to catch
03:39shadow: I tried mirrored 1920x1080@60 and I can't see if it works or not but it doesn't show anything
03:39imirkin: skeggsb: does that nva0 with dp on pior still work btw?
03:40skeggsb: last i tried it did, but it's been a while
03:43imirkin: shadow: log_buf_len=10M will increase the dmesg buffer. or your can get it out of journalctl again
03:49shadow: imirkin: it's 4M+ so http://ai6fs.net/dmesg-nvtest.log.xz
03:49imirkin: thanks
03:50shadow: that time I did just enough with Gnome Display Prefs to get "BenQ" to show up in Gnome Display Prefs but did not mess any further
03:50imirkin: k
03:51imirkin: skeggsb: can you help me understand what this means? https://hastebin.com/perikojiqa.pl
03:53imirkin: shadow: is your panel using a 1680x1050 mode out of curiousity?
03:55imirkin: shadow: looks like link training fails, which is why you don't see anyting
03:55shadow: imirkin: the laptop has a native 1920x1200 resolution, and the BenQ EX3501R is ultrawide 1440p whatever that happens to be
03:55imirkin: shadow: yeah ... i just see it trying to set a 1680x1050 mode, which is why i was asking
03:56shadow: the laptop in Gnome Display Prefs does list 1680x1050 as the next lowest mode of the same aspect ratio (16:10)
03:58imirkin: oh, that print is busted.
03:58imirkin: or .. not
04:05imirkin: skeggsb: was nva0 the one with anx9805?
04:05skeggsb: yup, it's the only external chip we support
04:05skeggsb: "support"*
04:05imirkin: i see. so this G92, in order to work with XYZ chip would need chip XYZ support
04:06imirkin: shadow: can you provide your vbios?
04:06shadow: imirkin: sure, how?
04:06imirkin: shadow: /sys/kernel/debug/dri/0/vbios.rom
04:06shadow: 1sec
04:06skeggsb: presumably it's anx9805 too, or it'd have failed differently
04:06imirkin: skeggsb: i see no prints about anx9805
04:07imirkin: but i didn't ask him to enable aux debug. oops.
04:07skeggsb: you won't, but the i2c device that we create wouldn't exist for the connector if it were an unsupported one
04:07skeggsb: you'd see:
04:07skeggsb: nvkm_debug(&i2c->subdev, "dcb %02x drv %02x unknown\n",
04:07skeggsb: i, dcbE.extdev);
04:07skeggsb: something like that
04:08skeggsb: and you wouldn't get ddc either
04:08imirkin: ah ok
04:08shadow: imirkin: http://ai6fs.net/vbios.rom.xz
04:09imirkin: shadow: thanks
04:09shadow: oh that is a small file I didn't need to compress it
04:10imirkin: skeggsb: where is it in the vbios?
04:11shadow: I've changed modes a few times if it matters, between the 'journalctl -k -b 0 -o short-precise' dmesg-nvtest log and the vbios.rom information
04:13skeggsb: imirkin: https://nvidia.github.io/open-gpu-doc/DCB/DCB-4.x-Specification.html#_i2c_device_table
04:14imirkin: I2C table at 0xbeb5 version 4.0 defaults 0 1
04:14imirkin: I2C 0: type 0x05 [PNVIO] loc 0
04:14imirkin: this is what nvbios prints...
04:14skeggsb: also, DCB has an "external link type" field
04:15skeggsb: iirc that's what we use
04:15imirkin: yeah
04:15imirkin: https://hastebin.com/upavatawog.css
04:18imirkin: not sure where the anx9805 reference would be there
04:19shadow: 1920x1080 mirrored is the mode that shows blue channel output on the BenQ EX3501R
04:19shadow: "1920x1080 99.92* 60.00 60.00 50.00 59.94 30.00 25.00 24.00 29.97 23.98"
04:20shadow: LVDS only at 1920x1080 60.01*
04:20shadow: sorry for the noise this is rather exciting to get any kind of output :)
04:22imirkin: shadow: oh hm, crap. i think it might still be trying to do stuff at 10bpc
04:22imirkin: let me fix my patches
04:26imirkin: shadow: recheck that branch. i just pushed another commit. might help.
04:26imirkin: shadow: next time you boot, please use nouveau.debug=disp=trace,i2c=trace
04:27imirkin: (and keep the drm.debug in there too)
04:29shadow: ok patching compiling will try
05:14imirkin: shadow: i'm out. but make that dmesg available, i may be able to glance at it tomorrow.
05:19HdkR: oop, netsplit killed me
05:20HdkR: Congrats on the massive Volta and Turing PR! :D
05:23skeggsb: HdkR: thanks, hopefully it works well enough up-front
05:24HdkR: I see you also had to handle the case of imnmx missing :P
05:27skeggsb: i should probably add that back for turing sometime
05:27skeggsb: but, there's a LOT that turing can add that wasn't done... basically just re-used sm70 stuff for it for the moment
05:28HdkR: Yea, it's quite similar in that regard
05:28HdkR: Fix the program header differences and off you go :P
05:28skeggsb: it was a pleasant surprise that it's backwards compatible... i'd feared the worse
05:29skeggsb: it has some less pleasant surprises at the class level but :P particularly mme
05:31HdkR: I'm happy that I never had to look at MME
05:32skeggsb: it's... fun? :P
05:33HdkR: I've heard horrors about some of the features, I don't believe you
05:34skeggsb: there might have been a /s on the end of that
05:34HdkR: :)
05:35imirkin: skeggsb: can you double-check that https://gitlab.freedesktop.org/mesa/mesa/-/commit/e22a86bd15bfdc1726f5dfe4d63390546a0c6afb?merge_request_iid=5377 is actually required?
05:35imirkin: that goes against every fifo principle i'm aware of...
05:35skeggsb: what do you mean?
05:36imirkin: 2 separate command groups vs 1 with 2 sequential writes
05:36imirkin: should be identical.
05:36skeggsb: it doesn't matter how you send the methods, in groups or alone
05:36imirkin: so why is that change required?
05:36skeggsb: i didn't want to stick "if (class < GV100_3D_CLASS) {} else {}" everywhere?
05:37imirkin: oh. there's a later change.
05:37imirkin: i see.
05:38skeggsb: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377/diffs?commit_id=f440e420ef355f5aeec52905632cd7d4f2b1aec8#5de40319d272d1145433669282002090697c34ee_904_912
05:38skeggsb: errr, not that bit
05:39imirkin: well wtvr. i might look at it later.
05:39skeggsb: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377/diffs?commit_id=f440e420ef355f5aeec52905632cd7d4f2b1aec8#1e140be0ffe623c9a52eb73abeccdd1edd427cc7_75_73
05:39skeggsb: that bit
05:39imirkin: looks like you chose nir ... have to be a bit careful, since the caps are probably in a bit of disrepair for nir
05:39imirkin: i.e. features have been added via tgsi that weren't checked for nir
05:40imirkin: and any new features i'm adding for nouveau are tgsi-only, so ... yeah.
05:40skeggsb: that seems... like not a fantastic direction to go in
05:40imirkin: i never supported nir.
05:40skeggsb: given a bunch of stuff is *only* being done in core mesa for nir
05:41imirkin: anyways, that's the present situation.
05:41skeggsb: yeah, i'm aware. if someone wants to add tgsi support, they're welcome, i didn't see the need to reinvent the wheel here
05:41imirkin: ok
05:41imirkin: so like i said ... double-check caps, i expect some stuff either needs piping through nir or needs cap-disabling
05:42imirkin: the viewport mask stuff comes to mind.
05:42skeggsb: the initial version supported both, and codegen becomes a nightmare for gv100
05:43imirkin: and i'm sorta-working on passthrough gs, which again would be tgsi-only. although it's mostly in glsl anyways, so perhaps not a big deal.
05:44imirkin: otoh, perhaps it's just time for me to bow out, since my views aren't really meshing with the RH agenda
05:44skeggsb: it's got nothing to do with RH as a company, i made these decisions myself
05:44imirkin: ok
05:44imirkin: s/RH/rest of the team/ then?
05:45skeggsb: because, well, they make sense. we get to leverage the work done by all the other drivers, and we have very little manpower
05:45skeggsb: nope, i'll take the "blame" for this one solo
05:46imirkin: well, karol did the nir -> nvir stuff on RH time too, i think, as part of some compute efforts
05:46skeggsb: presumably he came to the same conclusions that i did
05:46skeggsb: you'd have to ask him :)
05:47imirkin: i figured you guys were "close", but perhaps you're on opposite ends of the company as well as the world :)
05:48skeggsb: we cross paths, but don't have "evil plan to conquer the world" meetings or anything :P
05:48imirkin: too bad
05:49imirkin: those are fun. esp if you're at the seat with all the buttons.
06:33airlied: GL4.6 is really a good reason to use NIR, I doubt anyone wants to write a NIR->TGSI backend for SPIR-V
09:39RSpliet: airlied: don't tempt imirkin... :-P
10:15kherbst: imirkin: nice, thanks for verifying
10:24RSpliet: skeggsb: on that PR, nvc0_program.c:649: prog->num_gprs = MAX2(4, MIN2(info->bin.maxGPR + 5, 256));
10:25RSpliet: Unless maxGPR can be negative, I don't think the result of the MIN2 can be equal or lower than 4 ever, rendering the MAX2() superfluous
10:26RSpliet: ... actually, if I sign in I can just add a comment to the PR perhaps
10:28RSpliet: Ooo that works. I feel all fuzzy now, I did a thing!
10:29shadow:applauds
10:35kherbst: RSpliet: btw, I'd like to see a mathematical proof of that LOP3_LUT thing
10:35kherbst: maybe you encountered something once?
10:35RSpliet: kherbst: which LOP3_LUT thing?
10:36kherbst: RSpliet: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377/diffs?commit_id=c7ab38ffbbdedb1e22357ab51b03b7aebcba4b23
10:36kherbst: and you use it like this: subOp = NV50_IR_SUBOP_LOP3_LUT(a | (b & ~c))
10:36kherbst: and the hw will give you the correct result
10:36kherbst: can be any expression
10:37kherbst: ~a & (~b | c) | (a & ~c)
10:37kherbst: as well
10:37kherbst: more like.. why are those magic numbers correct for what the op is doing
10:40RSpliet: So... looking at it 6 seconds, the magic numbers encode a lookup table. They precompute bitwise what the output should be for each possible combination of a,b,c - for which there are only 8 possible combinations
10:40RSpliet: Hence a uint8_t
10:40kherbst: 8?
10:40kherbst: ahh yeah
10:40kherbst: per source
10:41kherbst: 2^3
10:41RSpliet: Exactly
10:43RSpliet: I don't know exactly how each of the 8 bits relate to each of the possible combinations of a,b,c inputs, but I presume that's how they're derived
10:43kherbst: yeah.. that much is clear
10:43kherbst: I was mainly wondering why those magic numbers work :p
10:44kherbst: 0x45 = (~a & ~b) | ((a ^ b) & c) is one example
10:44kherbst: and if you put 0x45 as the subop you get the result of the expression
10:45kherbst: it's like magic :p but I am sure there is some obvious math behind it
10:45RSpliet: I'd just do the "dumb" thing and write a truth-table
10:45kherbst: I am just too lazy to think about it myself :D
10:45kherbst: yeah...
10:45kherbst: probably
10:46kherbst: but we have in the end 256 valid values :p
10:47kherbst: or maybe some are just garbage...
10:47kherbst: but comming up with a truth table here could be very annoying
10:47kherbst: the constant folding code kind of gives an idea on how it works in hw
10:47kherbst: probably
10:48RSpliet: I don't see how they're related at all
10:49kherbst: yep...
10:49kherbst: I am more fascinated here on why that works like this
10:50kherbst: I am mainly wondering if there are any limitations or if it just works for _all_ expressions
10:50RSpliet: It should work for all three-input expressions
10:50kherbst: yeah.. but you can reuse inputs the often you want
10:51RSpliet: Because you can reduce all three-input expressions to only 8 possible input combinations for each bit
10:51RSpliet: It's how FPGAs work
10:51kherbst: mhhh
10:51kherbst: I see
10:53RSpliet: I'll admit I'm slightly baffled as to why "0x45" would be the value for the expression you gave. I'd expect the truth table to contain four bits set and four cleared.
10:53kherbst: you just use the magic number for the sources
10:53kherbst: and 0x45 is the result
10:53kherbst: and then the hw op gives you the result of the expression for any numbers
10:54kherbst: so a = 0xf0, b = 0xcc and c = 0xaa
10:54RSpliet: Ah yes that bit is easy to explain too
10:56RSpliet: that just gives away that the output bit 0 of the result, say res[0] = LUT[(a[0],b[0],c[0])] ... the concatenation of the three bits
10:57kherbst: yeah.. I mean that's kind of what the constant folding code is showing how the subop relates to the result
10:58kherbst: the subop is in the end the LUT here
10:59kherbst: I am more surprised that you only need those three magic numbers and create those LUTs just by inserting them into the expression.. that's all
10:59RSpliet: The three magic numbers are all 8-bit for a reason. It essentially does 8 evaluations of the bit-wise expression in parallel
11:00RSpliet: Covering all combinations of (a,b,c)
11:00kherbst: 0xf0 = 11110000 0xcc = 10101010 0xaa = 10011001
11:01kherbst: ohh yeah
11:01kherbst: now I see it
11:01RSpliet: The chosen order is interesting
11:01kherbst: yep
11:02RSpliet: But I guess that is just "hw spec" or sth
11:02kherbst: yeah.. has to be :p
11:03RSpliet: 0xCC = 11001100 by the way. And 0xAA = 10101010
11:03RSpliet: Which has a more logical ordering
11:03RSpliet: But the same principle applies
11:03kherbst: ohh.. right
11:04kherbst: it's more obvious with those numbers actually
11:04kherbst: interesting
11:05RSpliet: What I wonder is why the LOP3_LUT loops 32 times. Can you provide a different LUT for every bit in your inputs?
11:05kherbst: you have 32 bit inputs
11:05kherbst: :p
11:05RSpliet: Yes
11:05RSpliet: Can you provide a different LUT for every bit in your inputs?
11:05kherbst: no
11:06kherbst: how'd you encode it?
11:06RSpliet: Yeah, not without a hell of a lot of space
11:07kherbst: but maybe we could come up with better code here :p
11:07RSpliet: So what's the for-loop in nv50_ir_peephole doing?
11:07kherbst: I just don't see how though
11:08kherbst: RSpliet: we have to check every bit of the inputs against the LUY
11:08kherbst: *LUT
11:08RSpliet: manually? Oh this is lowering code?
11:08kherbst: no, constant folding :p
11:09kherbst: so if all your three inputs are constants, we can calculate the ops result like this
11:09RSpliet: Right
11:09RSpliet: Yep, that's easy enough
11:10kherbst: essentially if we could translate the subop back to the exp, we only need to execute the expression :p
11:10kherbst: but that sounds painful to do
11:11RSpliet: the uint8_t lut becomes the concatenation of (a[n],b[n],c[n]), so really the index _into_ the LUT, and then you check that bit in the LUT. Double-negation to force a 0 or 1 (not sure if superfluous), then shift into the output
11:11kherbst: would be interesting if gcc is able to autovec this one
11:12RSpliet: Nah, this code is fine
11:12kherbst: right
11:12kherbst: yeah.. just.. could be faster :p
11:12kherbst: maybe
11:12kherbst: let's see what compilers do with that
11:13RSpliet: I don't see an obvious way how.
11:13RSpliet: Extracting an expression from a look-up table is tedious, and you'll end up with a sum-of-products which isn't necessarily the shortest possible expression.
11:15RSpliet: The loop body is tiny, I don't expect this to become a bottleneck. The compiler can try unrolling the loop quite easily, but most processors that execute this have branch predictors anyway - it doesn't matter.
11:16RSpliet: Do you have a peephole optimisation to eliminate the instruction in the (hopefully corner-)case where subop is either all-0 or all-1?
11:16kherbst: ufff
11:16kherbst: it's like 30 instructions
11:16kherbst: in total
11:17RSpliet: Sounds irrelevant then :-D
11:17kherbst: no loop
11:17kherbst: :D
11:17kherbst: RSpliet: https://gist.github.com/karolherbst/e64d0e9571ad288a17c7a550d4755cf6
11:17kherbst: ohh
11:17kherbst: it does loop
11:18kherbst: totally missed the label and jump
11:19kherbst: ohh
11:19kherbst: I should have enable the avx stuff
11:20kherbst: sse4 required for some magic
11:21kherbst: avx2 for really nice code
11:22kherbst: ufff..
11:22kherbst: the hell
11:22kherbst: why is that so complicated :D
11:23RSpliet: it's bitwise magic
11:23kherbst: yeah...
11:24RSpliet: Anyway, if subop is 0xff, you can replace the entire op with an immediate mov rXX,0xffffffff . For subop=0x0, it'd become mov rXX,0x0 . Or however you handle those constants :-P
11:24RSpliet: Which makes for happier constant folding
11:26kherbst: just that we won't end up with those subops :p
11:26RSpliet: I sure hope we do't
11:26RSpliet: don't
11:27kherbst: I am wondering if there is hw around with such a weird op
11:58RSpliet: kherbst: the op looks like it was designed with hardware in mind, it's _really_ easy to implement.
11:59kherbst: yeah
11:59kherbst: that's why I was hoping if we have an op like this in x86
12:01RSpliet: I suspect x86 bets on old-fashioned expression optimisation and evaluation. Stuff like multi-issue, out-of-order execution and a plurality of ALUs make the benefit for x86 way smaller than for GPUs
12:02RSpliet: I also kind of expect that GPUs have weird use cases in mind, like binary neural networks or crypto
12:04kherbst: maybe
12:04kherbst: well.. we need it anyway :p
12:04kherbst: RSpliet: we use it for INSBF lowering :p
12:05imirkin: shadow: did you ever get around to testing the patch i sent?
12:05kherbst: well.. for volta
12:05shadow: imirkin: nearly done compiling
12:05kherbst: RSpliet: also.. there is no LOP2 :p
12:05kherbst: so we use it for normal and/or/xors as well
12:05imirkin: shadow: you must have done a full recompile? otherwise it should have been ~instant
12:07shadow: imirkin: correct :-/ I am learning my mistake
12:09shadow: imirkin: when I reboot with those debug flags, what actions should I try before saving the kernel messages? So far, just plugging in the DP cable doesn't seem to have an effect, I've had to go to Gnome Display Properties and try changing modes on the laptop display for it to "catch"
12:10shadow: I could try to play with the modes until the BenQ monitor displays some image, or not
12:10imirkin: shadow: just plug it in
12:10shadow: okay
12:10RSpliet: kherbst: there's no compelling need, there would only be a handful of possible LUTs that involve both a and b, which you can probably cover with regular or, and, nand, nor, xor and some negated inputs. Can probably always reduce to 2 ops. Besides, LOP3 with one input ignored is probably still as efficient as it gets.
12:11kherbst: RSpliet: yeah.. volta removed a lot of those pointless things
12:11kherbst: there is also no add anymore
12:11kherbst: just iadd3
12:11kherbst: fadd still exists though
12:12kherbst: although I guess even fadd has three sources actually
12:12RSpliet: Sounds like they wanted to squeeze every last drop out of the instruction encoding.
12:12kherbst: doubtful
12:12kherbst: there is soo much space
12:12kherbst: the just wanted to have a sane layout of stuff
12:13kherbst: so ever instructions i built the same essentially
12:13kherbst: *is
12:13RSpliet: Yep, that makes the instruction decoder a lot simpler, less transistors and latency.
12:13kherbst: yes
12:13kherbst: exactly
12:13kherbst: also no sched opcodes anymore
12:13imirkin: shadow: after you grab the log, feel free to play with it further. but chances are that will just complicate my analysis of the logs.
12:13kherbst: it's just part of the instruction
12:14RSpliet: kherbst: it was going that way. sched blocks at some point went from covering 7 to 3 insn. 1 was the logical next step :-P
12:14kherbst: :p
12:14shadow: imirkin: copy thanks
12:14kherbst: well.. otherwise using 128 would be too much of a waste :D
12:15RSpliet: I suspect it's a bit of a waste regardless... don't suppose they're short of instruction bandwidth.
12:18kherbst: isn't much of a change compared to maxwell/pascal anyway
12:18kherbst: instead of 4 * 64 you have 3 * 128 bits.. so only 50% more
12:18kherbst: and you get rid of a lots of stupid and pointless ops
12:19kherbst: as the encoding allows for more things
12:19RSpliet: 50% is a lot in my book :-P
12:19kherbst: yeah.. but normally you think you need 100% if you double the size :p
12:19RSpliet: but if the instruction decoder is a limiting factor in achieving high clock speeds or sth, and bandwidth from the instruction memory isn't, it's a sensible choice
12:20kherbst: RSpliet: I wouldn't be surprised if the new decoder is actually smaller
12:21RSpliet: Oh undoubtedly, no more awkward caching mechanism for scheduling options, no more "fetching a whole block" on a jump to make sure you get the sched codes
12:22kherbst: also.. there is just one form basically :p
12:22RSpliet: Yeah. I mean part of me wonders whether a 96-bit encoding wouldn't've been sufficient, but I know too little about all this
12:23kherbst: it's a bit different depending on the source kinds
12:23kherbst: but essentially every instruction looks the same
12:24kherbst: RSpliet: it's effectively only 96 bits
12:24kherbst: the other 32 bits are for sched
12:24kherbst: where not everything is used
12:39shadow:off to test patches
13:01shadow: imirkin: http://ai6fs.net/nvt2-kernel.log.xz and http://ai6fs.net/nvt2-vbios.rom.xz I cleaned up the other files and gists
13:02RSpliet: shadow: did you spot my hint for journalctl to use -k to only get kernel messages? I guess at this point it'll make little difference to the log file size, but could make for a slightly easier read :-P
13:03shadow: RSpliet: thank you, yes. I'm using journalctl -k -b 0 -o short-precise
13:04imirkin: shadow: cool. the vbios doesn't change, btw
13:04imirkin: shadow: unfortunately the top appears to have gotten cut off...
13:05shadow: oof. hmm
13:05imirkin: shadow: also did you plug the monitor in?
13:05shadow: imirkin: yes
13:06imirkin: shadow: anyways, your log is cut off at the start
13:06imirkin: and i see no reference to pior's, which leads me to believe something else is wrong too
13:17shadow: imirkin: try http://ai6fs.net/nvt2b.log.xz
13:19imirkin: that looks better
13:19imirkin: still missing the start, but meh
13:20shadow: imirkin: is it missing at the start? I see what you're saying but I'm not sure how to get journalctl to produce differently
13:20imirkin: yeah. i just use "dmesg" :)
13:20shadow: :)
13:20imirkin: and if you have long logs, just add log_buf_len=10M or whatever
13:33shadow: imirkin: okay using dmesg I have these http://ai6fs.net/dmesg-plug.log.xz and http://ai6fs.net/dmesg-plug_and_blue_output.log.xz
13:35shadow: RSpliet: there are differences from journalctl -k and the dmesg output apparently
13:36imirkin: shadow: stupid question -- this monitor works with the DP cable with another PC, right?
13:36RSpliet: shadow: somehow I'm not surprised...journald is my least favourite SystemD feature, too much NIH for my taste
13:36imirkin: shadow: also, chances are if you want to get going, then sticking a DP -> HDMI passive adapter + connecting to HDMI on the monitor would work
13:36shadow: imirkin: it works with the same DP cable and raspberry pi 4, if that helps
13:37shadow: imirkin: err
13:37imirkin: shadow: ah ok
13:37shadow: imirkin: I am mistaken
13:37imirkin: surprised rpi has DP
13:37RSpliet: yeah, don't think it does
13:37shadow: imirkin: HDMI actually
13:37imirkin: ah. do you have a DP -> HDMI adapter?
13:37imirkin: (passive)
13:38shadow: I don't, sorry no. I'm just forgetting how it was hooked up. I'll have a computer to test the DP function built up later this month when parts arrrive
13:38imirkin: ah ok
13:38imirkin: no worries
13:39imirkin: i'd tell you to go to a local store and buy one for $5, but ... that may not be an option right now.
14:17imirkin: skeggsb: [ 81.293116] nouveau 0000:01:00.0: i2c: aux 010d: ANX9805 train 4 0a 0
14:18imirkin: skeggsb: that seems like it should work, right? getting timeouts...
22:05kherbst: imirkin: anyway, why was the nir chatter earlier all about? I thought it is clear, that spirv -> nvir is a huge waste of time and that even if we would have time, the TGSI path waste time on writing a proper compiler as nir already has everything we would just write on our own anyway
22:06kherbst: or maybe I just misunderstood your complain
22:06kherbst: also nobody wants to get rid of codegen.. just rework it and fix it
22:07kherbst: and if I'd had to choose today what to base a new compiler on I'd choose nir, because it's the obvious choice
22:07kherbst: I see 0 benefits in even caring about TGSI anymore
22:08RSpliet: kherbst: I presume you mean for NV50+? Although, I can respect your opinion if you claim you see 0 benefits in caring about older cards. :-P
22:09kherbst: RSpliet: nir is support from nv50 onwards :p
22:09kherbst: it's not in the best state though
22:09kherbst: but.. I'd be willing to fix it all up
22:09RSpliet: kherbst: would you still care about TGSI for NV40 is what I asked? ;-)
22:09kherbst: and for GL 4.6 we need nir _or_ spirv -> nvir
22:09kherbst: soo...
22:09kherbst: I choose nir
22:10kherbst: RSpliet: no compiler for nv40 :p
22:10kherbst: just write a new one based on nir
22:10RSpliet: NV30-NV40 has an assembler right?
22:10kherbst: yes
22:10kherbst: so either write a full compiler or just use nir ;)
22:11airlied: you'd probably get benefits on nv30/40 from using nir and fixing up nir->tgsi :-P
22:11RSpliet: TGSI is actually a really close match to the NV30's 4vec ISA. I wouldn't want to do the NIR thing directly
22:11kherbst: nir does support vector stuff
22:11RSpliet: Then again, I'm a bit ignorant on what NIR can and cannot do
22:11kherbst: nir gives you all the CFG based ops.. and more
22:11kherbst: not that important for nv40 though
22:12kherbst: but it allows you to care about more important things as others also improve nir and so on
22:12kherbst: and there are shit tons of lowering code
22:12kherbst: especially for texture stuff
22:12kherbst: and nirs interface is sane so that you can loop over opts
22:12kherbst: it... simply allows you to spend more time on other things
22:13kherbst: airlied: :D sounds like a plan :D
22:19kherbst: but I am most curious about those RH remarks.. because RH really has nothing to do with the decision to use nir instead of doing a spirv -> nvir pass
22:20airlied: grumpy old man shouts at sea? :-P
22:20imirkin: if you're referring to me, i shout at a wall.
22:21imirkin: eventually those bricks *will* fall, mark my words
22:22airlied:messed up his simpson reference, it's yells at cloud!
22:22imirkin: on the driver's license?
22:22airlied: which in the modern age is actually something you can yetll at
22:23airlied: imirkin: yeah that's the one
22:23airlied: I can also see how you can confuse RH with cloud, it's our marketing :-PO
22:25imirkin: brought it back around. well done, sir