00:08nyef: ... Or it's that it's far too easy to not realize that you've not plugged the DPort cable far enough into the back of the monitor. /-:
00:09imirkin: gotta get it in there good
00:10nyef: Gotta rotate the monitor 45 degrees off center in order to have the angle to be able to *see* that it's not in there good enough.
00:10imirkin: really too bad that 3d monitors are no longer going to be a thing. cool idea =/
00:10imirkin: (still a bit off in terms of the actual execution though...)
00:58Horizon_Brave: Good evening folks
01:38noobineer: I have a monitor with its EDID not getting detected correctly, it works but not at the highest res, only up to 1024x768, i was reading this guide (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/EDID/HOWTO.txt) about creating a EDID for my monitor, is that worth doing or should I try making an xorg.conf instead? I'm using ubuntu-mate 16.04 and nouveau
01:40gnarface: noobineer: personally i've had more luck with the custom xorg.conf approach
01:41gnarface: noobineer: i've gone down the path of manually loading the extracted binary EDID too, only to find out that for whatever reason, it didn't help
01:42noobineer: thanks, another question then
01:42noobineer: ./usr/share/X11/xorg.conf.d/ is this the right place to store it?
01:42noobineer: I'm not sure how that system has changed with the automatic detection stuff nowadays
01:43gnarface: it varies depending on distro
01:44gnarface: if you look at the top of your Xorg.0.log (the location of which which sadly also now varies depending on distro) you can see where it looks for it. /etc/X11/xorg.conf is probably the FIRST place it looks though in most cases. any directory named "xorg.conf.d" is for secondarily loading partial snippets of xorg.conf
01:44noobineer: well, I guess one thing I'm not sure about, is can I write just a partial config and only define that sections I want to change like display, device, screen? or should it be the whole config in one?
01:45gnarface: my *personal* experience has been better just using a whole xorg.conf, but it has helped a lot that i'd been using it since well before the move to auto-detect everything
01:45noobineer: ok that's kinda what I thought
01:45gnarface: in theory though, yes you should be able to just write a partial to override just parts
01:45gnarface: most distros actually ship by default with several such partials these days, and NO main xorg.conf
01:46orbea: that is what I do, just define the things I want to change and no main xorg.conf
01:46gnarface: (the partials mostly to do with various input devices/drivers)
01:46noobineer: so do you name the partials anything in particular? I was seeing it called 20-nouveau.conf in an example somewhere
01:47gnarface: i think they have to end with .conf, otherwise the numbering is just to control what order they load
01:48noobineer: alright, thanks for the help explaining
01:48gnarface: no problem
01:49Horizon_Brave: +1 for remaining quiet and learning something from somone else's questions xD
03:55nyef: Bleh. intel-gpu-tools opens the *first* card available, and there's no override.
03:56nyef: God forbid someone should have two or more cards in their system, and even worse would be to have them all use the same driver. /-:
04:45imirkin: nyef: not a common use-case with intel gpu's :)
04:55nyef: No, I imagine not, what with them having that horrid piece of junk on each GPU die.
05:27nyef: (In retrospect, I probably should have put an emoticon on that last line... except that I'm really not a fan of x86 type CPUs.)
07:00karolherbst: imirkin: if you wanna check out hitman: use my "hitman" branch and start the game like this: MESA_VENDOR_OVERRIDE="ATI Technologies Inc." gdb --args ./HitmanPro -ao START_BENCHMARK true -ao BENCHMARK_SCENE_INDEX 1 -ao AUTO_QUIT_ENGINE 120 ConsoleCmd UI_ShowProfileData 1 ConsoleCmd EnableFPSLimiter 0 -ao FullScreen 0 ConsoleCmd settings_vsync 0 ConsoleCmd settings_SetHDR 0 -ao RESOLUTION 1280x720
07:00karolherbst: that's what I used for my testing, gdb is important, otherwise the game catches the crashes and doesn't print any traces
07:00karolherbst: won't be able to do anything the next few days I think
07:01karolherbst: HitmanPro is within the bin folder
07:01karolherbst: and you might want to delete libcurl from lib
07:49pq: imirkin, nyef, as igt is no longer intel-specific, such issues would be well too report.
17:48nyef: ... X won't start unless there's a display connected to the HDMI on the second video card, but it then won't *use* that second video card for output? What?
17:50nyef: Oh, and this is on 4.10. On 4.9, it's fine.
17:50nyef: (Well, 4.9.6.)
17:51pmoreau: Maybe xf86-video-nouveau 1.14 helps?
17:52imirkin_: there is also some kind of issue with multi-screen on 4.10
17:59nyef: Yeah, I'm sortof trying to reproduce on gf119 some HPD and related total system lockup things that I've found on gt215.
17:59nyef: I might just move on and come back to it mid-to-late 4.11-rc or even once 4.11 is released.
18:00imirkin_: gotta pick your battles
18:01Echelon9-away: Can I find a python + envytools dev to review this trivial clean ups patchset? https://github.com/envytools/envytools/pull/87
18:06Echelon9-away: thanks imirkin
18:06Echelon9-away: anything you want me to review for a +1?
18:07Echelon9-away: my queue: skeggsb has been too busy to look at the HBM mem type, so I plan on just sending a nouveau patch for the new GDDR5X mem type in a day or so
19:01imirkin_: Echelon9-away: don't wait on skeggsb for anything in the next couple of weeks - he's out on vacation i think
19:05Echelon9-away: ok, that's good to know. I'll press ahead
19:42karolherbst: how can I dump shaders again now with the new style? :/
19:43karolherbst: ahh MESA_SHADER_CAPTURE_PATH
19:47nyef: ... Not finding a third set of infoframe registers on gf119. Doesn't mean that there isn't a set, just that I'm not finding one.
19:47imirkin_: look harder!
19:49imirkin_: nyef: also check if perhaps the gk104 stuff is really where the gf119 infoframe stuff is :)
19:50imirkin_: hm probably not
19:50imirkin_: i suspect that gf119 is a lot more similar to gt215 than gk104
19:51karolherbst: imirkin_: I've pushed the hitman pro shaders
19:51karolherbst: 1409 unique ones :(
19:51imirkin_: any that we do a horrible job compiling?
19:52nyef: For now, I'm going to do what I did for gk104: Swipe the audio infoframe registers, since they're not actually used to set the audio infoframe.
19:53karolherbst: currently checking
19:53karolherbst: it takes time to compile those
19:53karolherbst: total local used in shared programs : 6845 -> 6845 (0.00%)
19:53karolherbst: I guess this is kind of bad
19:54nyef: I'm also considering working out the overall register map for PDISPLAY as best I can understand it, since I just don't have a good mental model for it at this point.
19:59imirkin_: feel free to update the pdisplay stuff in rnndb
19:59imirkin_: you're aware of the 'lookup' tool i presume?
20:03nyef: Vaguely aware.
20:04karolherbst: by running some passes again: total local used in shared programs : 6845 -> 6601 (-3.56%)
20:04imirkin_: nyef: try it. lookup -a d9 61c520
20:04imirkin_: er oops
20:04imirkin_: lookup -a d9 616798
20:04nyef: The issue isn't doing the lookup, it's "what is the structure of this register space over the various DISP versions?"
20:04imirkin_: yeah, i get it...
20:05imirkin_: stuff is generally named so that the same names across disp versions means roughly the same things
20:07karolherbst: imirkin_: how long would it take to implement a statistic, which tells us how often a certain shader combination is executed?
20:12imirkin_: qapitrace has some thing which measures time spent in shaders
20:13karolherbst: okay, lots of spilling in shaders
20:13imirkin_: but yeah, it'd be nice to say we spend X time in this shader, etc
20:13imirkin_: maybe we can do somethign clever with counters
20:13karolherbst: but time spent in shaders is also nice
20:14imirkin_: unfortunately diff shaders can run in parallel...
20:14karolherbst: but we have shaders like this: type: 5, local: 404, gpr: 63, inst: 3802, bytes: 34768
20:14karolherbst: this is bad in itself
20:14imirkin_: yeah, you need to get more gpr's :)
20:15imirkin_: that's a compute shader too. ouch.
20:15karolherbst: takes 5 seconds to compile
20:15imirkin_: hardly seems worth it :)
20:16karolherbst: hum, am I doing something wrong? NV50_PROG_DBEUG=3
20:16imirkin_: i think one usually spells it "DEBUG"
20:16karolherbst: yeah, that helps
20:17karolherbst: that shader
20:18imirkin_: this is the most useless code ever
20:18imirkin_: set u32 $r13 lt f32 $r7 $r12
20:19imirkin_: slct u32 $r13 eq $r63 0xffffffff $r13
20:19imirkin_: which ... == $r13. or the inverse of r13.
20:19imirkin_: (i never remember the arg order)
20:19imirkin_: and then these things are or'd together
20:20karolherbst: I guess with a nice opt pass, we can eliminate a lot here
20:20imirkin_: a lot of ops, although it won't translate into fewer registers
20:21karolherbst: I don't mind this
20:21karolherbst: less instructions is already good
20:23karolherbst: so slct on a bool eq 0x0 0xffffffff is the bool itself
20:23imirkin_: or !bool. would have to think.
20:23imirkin_: (aka look up wtf slct does)
20:23karolherbst: if the condititon is true src1 is returned otherwise src2
20:24imirkin_: what are you looking at?
20:24karolherbst: I remember it this way, I will verify
20:26karolherbst: ohhh wait
20:26karolherbst: this comment: "SLCT(a, b, const) -> cc(const) ? a : b"
20:26karolherbst: src2 is the input
20:27imirkin_: yeah that sounds right.
20:27karolherbst: and this is checked against... 0
20:27karolherbst: I think
20:27imirkin_: so... eq 0 -1 a => !a
20:28karolherbst: no... I think that src2 is checked against 0 with the CC on true: src0 on false: src1
20:28karolherbst: or was that set?
20:29imirkin_: that is correct.
20:29karolherbst: slct->getSrc(2)->asImm()->compare(slct->asCmp()->setCond, 0.0f)
20:29karolherbst: is inside the code
20:29imirkin_: oh right
20:29imirkin_: a == 0 ? 0 : -1
20:29imirkin_: ok yeah. so that is just "a"
20:29karolherbst: sounds about right
20:30karolherbst: slct(0, -1, bool) -> bool
20:31karolherbst: what instructions can produce a bool? set and what else?
20:31imirkin_: just set i think
20:31karolherbst: do I have to check against u32 as well?
20:31imirkin_: there's a function that's like
20:31imirkin_: or something.
20:32imirkin_: the type should match the set
20:32imirkin_: set can produce either a u32 bool 0/-1 or a f32 "bool float" 0/1.0
20:32karolherbst: slct(true, false, set) -> set (as long as slct.dType == set.dType)
20:32imirkin_: well, for slct eq
20:32imirkin_: there's slct ne
20:32imirkin_: and slct lt
20:32karolherbst: ohh right
20:33karolherbst: slct_eq(true, false, set) -> set
20:33imirkin_: yeah, but it's easy enough to handle the slct_ne case too
20:33karolherbst: mhh opkay, I check if it's worth it an concentrate on slct_eq(true, false, set) -> set first
20:34karolherbst: there is also slct_eq(false, true, set) -> !set
20:34karolherbst: and so on
20:34karolherbst: inside which opt should I put it? algebraicopt?
20:35karolherbst: or rather ConstantFolding?
20:35karolherbst: I would tend to the first one
20:37imirkin_: i'd have to check the code
20:37imirkin_: we already do some amount of this
20:37imirkin_: do it in the same place where we do this ;)
20:37imirkin_: whereever that gets called, add your logic in there
20:37imirkin_: it's dealing with a slightly different case
20:37karolherbst: findSomethingWithZero not found
20:37imirkin_: but i think it's similar enough
20:38imirkin_: it's under ConstantFolding it seems
20:38imirkin_: so i'd add stuff there.
20:38imirkin_: it likely won't fit exactly into the current framework
20:38karolherbst: inside opnd3?
20:39imirkin_: er well
20:39imirkin_: i think expr() will get called for it
20:39imirkin_: since src0 and src1 will be immediates
20:39karolherbst: makes sense
20:40karolherbst: there is a if ( a==b) res.data.u32 = a->data.u32; .. oh well
20:40karolherbst: should be easy to add it there
20:42karolherbst: that expr function does odd things at the end
20:42imirkin_: yeah, you don't want any of that
20:42imirkin_: so you'll want to handle stuff + return
20:42karolherbst: mhh makes sense :D
20:42imirkin_: it's meant for "real" constant folding
20:42imirkin_: e.g. add 2 + 3 gets replaced with a mov 5
20:43karolherbst: the currend slct switch does "slct(a, a, b) -> a" right?
20:43imirkin_: not in constant folding though...
20:44imirkin_: i'm guessing in algebraicopt?
20:44imirkin_: oh right. i see.
20:44karolherbst: okay, just wanted to make sure, cause I will add a comment
20:44imirkin_: yes. if a is immediate and a == b
20:44imirkin_: which i guess can happen due to various stupidity
20:46karolherbst: is imm0 == src0 and imm1==src1 in expre?
20:46karolherbst: yeah, seems that way
20:57karolherbst: nice... there is no eq at that point :(
20:59karolherbst: imirkin_: can I simply do imm0.reg.data.u32 == 0 and imm1.reg.data.u32 == -1 or is there something super smart I could use?
20:59imirkin_: that's it
20:59imirkin_: so remember that this is all before LoadPropagation
21:00imirkin_: so the values are all GPR's
21:00imirkin_: and then that getImmediate() figures out if the GPR has an immediate value
21:00karolherbst: yeah sure, I have the immediates alredy, that's not my problem
21:00karolherbst: I was more like finding a way _not_ to check for the real type
21:01karolherbst: sadly ImmediateValue::compare only works for f32
21:04karolherbst: yeah well, then I still have that -1/1 issue for u32 vs f32 floats, haven't I?
21:04imirkin_: can't win 'em all
21:04imirkin_: i wouldn't worry about set's that return float
21:04imirkin_: they're fairly rare
21:04karolherbst: maybe I should just write a helper function for that
21:04karolherbst: ohh okay, then u32 it is for now
21:09karolherbst: imirkin_: okay, they come in as slct_ne(-1, 0, ibool)
21:10karolherbst: imirkin_: do I have to replace the slct with a cvt or is mov fine?
21:11imirkin_: if you neg it, yeah. otherwise mov is fine.
21:11karolherbst: slct_ne(-1, 0, ibool) -> ibool, right?
21:12karolherbst: uhhh, i is a CmpInstruction, does that matter?
21:12imirkin_: yes. make a new instruction, sink this one.
21:13karolherbst: I just have to create a new one after i, and bind the def, delete_Instruction(i); right?
21:15karolherbst: bld.mkOp1(OP_MOV, TYPE_U32, i->getDef(0), i->getSrc(2)); ?
21:18karolherbst: 100 instructions killed here
21:27karolherbst: hum... https://gist.github.com/karolherbst/33d2eed71ac76812d7fd758828414423
21:27karolherbst: why 0.53% more locals :(
21:32imirkin_: can't win 'em all
21:32karolherbst: that result looks pretty much like ... useless
21:32imirkin_: i dunno. -0.5% instructions sounds nice.
21:32karolherbst: yeah, but 0.5% locals is bad
21:33karolherbst: but I guess we will be able to fix that
21:33imirkin_: yeah, it's a little sad.
21:33karolherbst: can't I use OP_NEG in the negated case directly?
21:34karolherbst: uhhh wait
21:34karolherbst: how do I negate a bool? :/
21:37karolherbst: if I do it for all 4 combinations the result gets worse....
21:43karolherbst: over our shader-db: total instructions in shared programs : 3902748 -> 3897016 (-0.15%)
21:43karolherbst: and just 0.04% more locals
21:43karolherbst: the locals are all from hitman though
21:44karolherbst: hurt also a few F1 gprs
21:45karolherbst: it only helped f1, tomb_raider and hitman... all feral games
21:47imirkin_: if (cmp->dType != TYPE_U32 && slct->dType != TYPE_U32)
21:47imirkin_: probably want ||
21:48imirkin_: CmpInstruction *cmp = findOriginForTestWithZero(i->getSrc(2));
21:48imirkin_: that's not quite right
21:48imirkin_: we don't care about it being a test with zero
21:48imirkin_: i'd just do CmpInstruction *cmp = i->getSrc()->getInsn()->asCmp();
21:48imirkin_: (that might crash, so guard it on getInsn() != null)
21:49imirkin_: although.... at that point, it really should be != null
21:49karolherbst: I don't care if that's a cmp though
21:49karolherbst: cause I don't access it
21:50karolherbst: I set check op and that's whould be good enough
21:50imirkin_: that's fine
21:50karolherbst: mhh, but it could be a mov or so
21:50karolherbst: or not?
21:51imirkin_: yeah, it could =/
21:51imirkin_: that's why findOriginForTestWithZero
21:51imirkin_: has that loop
21:51imirkin_: but it probably won't be :)
21:52karolherbst: I could work that OP_ out a bit, but I doubt it's really worth it
21:53karolherbst: we should add compiled shaders into the compare script
21:53karolherbst: because some are failing
21:53karolherbst: or handle that case better
21:54karolherbst: "nvc0_program_translate:609 - shader translation failed: -4"
21:54imirkin_: go for it :)
21:54karolherbst: I just got a shader-db/run crash again...
22:11karolherbst: mhh nice, with the fixes, 4 instruction less optimized
22:13karolherbst: imirkin: wasn't there a way to check for U32 and S32 at the same time, or doesn't it matter here?
22:16imirkin_: not sure
22:20karolherbst: lol " On Linux, VDPAU can now be used for displaying frames. It's used to achieve smoother framerates because it allows precise tracking and scheduling of frame times. Added ogl_bEnableVDPAU cvar (disabled by default) to control whether VDPAU is used - requires gfxRestart()."
22:20karolherbst: this is part of the talos principle changelog :D
22:20karolherbst: it's from 2015 though
22:22karolherbst: well, if it works :D
22:28karolherbst: newer version of the shader: https://gist.githubusercontent.com/karolherbst/fbfa391895cc8de8bc79541a6846e868/raw/fb81438f25a63ced254421386f6887d03f78d924/gistfile1.txt
22:28imirkin_: 127: set u32 $r12 neu f32 $r3 1.000000 (8)
22:28imirkin_: 131: and u32 $r12 $r12 0x3f800000 (8)
22:29imirkin_: i have an opt that's supposed to pick that up =/
22:29imirkin_: i guess it's not getting triggered coz of something dumb
22:29karolherbst: maybe that is genreated too late?
22:29imirkin_: anyways, that's supposed to become set f32 $r12 ...
22:30imirkin_: also i'm surprised that all those or's/and's survive... it should be done with set_and
22:30imirkin_: and set_or
22:30karolherbst: maybe cause it is all u32?
22:30imirkin_: might not have an opt for that...
22:32karolherbst: do you want to look into it or shall I?
22:32imirkin_: i don't really have time for much right now
22:33imirkin_: also i'd kinda recommend slct where the cond is the result of a set to get converted into SELP and make the set produce a predicate.
22:33imirkin_: since that should reduce register pressure a tad
22:33karolherbst: I have such an opt
22:33imirkin_: 3381: set u32 $r2 ne $r2 $r63 (8)
22:33imirkin_: 3382: slct u32 $r1 ne $r15 $r1 $r2 (8)
22:34imirkin_: sequences like this are foolish
22:34imirkin_: the set is unnecessary.
22:34karolherbst: let me search for it
22:34imirkin_: since it's comparing some value against 0
22:34imirkin_: the slct can take care of that.
22:35imirkin_: we ought to have a slct(set) algebraic opt for that
22:36karolherbst: it should do more than what's inside the comment
22:36imirkin_: (we already have a very basic SLCT one, but it can do more.)
22:42karolherbst: ohh, now I know what you mean
22:43karolherbst: so instead of using a gpr every slct(set) should uise a predicate instead
22:43karolherbst: and slct needs to be converted to selp
22:43karolherbst: or can slct read a predicate?
22:47imirkin_: i think you're missing the point
22:47imirkin_: what is the set doing?
22:48karolherbst: return a boolean value
22:48imirkin_: based on whether ...
22:49karolherbst: what do you mean?
22:49imirkin_: 3381: set u32 $r2 ne $r2 $r63 (8)
22:49karolherbst: that looks silly....
22:49imirkin_: so it returns a boolean value based on ... what
22:50karolherbst: comparison against 0?
22:50imirkin_: uh huh
22:50imirkin_: and then you take the result of that comparison
22:50imirkin_: and feed it as the cc arg of a slct
22:50imirkin_: which compares it against ...
22:50imirkin_: so the set is a little unnecessary
22:50karolherbst: I see
22:51karolherbst: even the CC is the same
22:51imirkin_: well, as long as slct can support the same setCond, it should be the same
22:51imirkin_: might need some careful flipping sometimes
22:52karolherbst: true is 0 for u32? or do I mix somethign up again
22:52imirkin_: 0 is always false
22:52imirkin_: -1 for u32 is true, 1.0 for f32
22:53karolherbst: I am too tired... I don't even know why what I did for the other opt is right anymore...
22:53imirkin_: as long as there are fewer instructions it must be right =]
22:53karolherbst: has to be
22:54karolherbst: it also looks the same (tm)
22:54karolherbst: allthough I guess I should see a differenc really fast
22:54karolherbst: or maybe not
22:58karolherbst: that "3388: slct u32 $r9 ne $r63 $r1 $r0"....
22:59karolherbst: oh man
22:59imirkin_: that's reasonable.
22:59karolherbst: $r1 is the result of a set
22:59karolherbst: same as the $r0
22:59imirkin_: yeah, but there's no way around that
23:00imirkin_: you might be able to or/and it somehow
23:00karolherbst: I wouldn't be too sure about this. this entire block looks a bit... oddisch
23:00imirkin_: heh. is that "odd in germany"? :p
23:01karolherbst: look oddish up
23:01imirkin_: a lot of german words have "sch" in them
23:01imirkin_: while almost no english words end in that
23:01karolherbst: that you mean
23:01karolherbst: "komisch" is german for odd
23:03karolherbst: what's s =
23:03imirkin_: shared memory
23:03imirkin_: shared between invocations
23:03karolherbst: I see
23:07karolherbst: 3366 and 3367
23:08imirkin_: a&~b vs b&~a
23:08karolherbst: better, those are results of sets
23:11karolherbst: no idea, that code looks stupid
23:19imirkin_: 3373: mov u32 $r12 0x00000001 (8)
23:19imirkin_: 3374: slct u32 $r3 eq $r12 0x00000002 $r1 (8)
23:19imirkin_: a very astute optimizer might notice that $r1 came from a set u32
23:20imirkin_: and woudl convert this whole thing into an add
23:20imirkin_: [because the two options are off-by-one]