00:03 karolherbst: imirkin_: add(c0[A], c0[A]) -> mul(c0[A], 2) or shl(c0[A], 1)
00:03 karolherbst: but dealing with those memory locations is always messy when writing ops :(
00:04 imirkin_: ?
00:04 karolherbst: well when you read twice from the same region and add those up, you could also just read once and mul/shift ;)
00:04 imirkin_: right
00:04 karolherbst: *address
00:04 imirkin_: i thought we did that
00:04 imirkin_: oh wait
00:04 karolherbst: apperently not that good
00:05 imirkin_: we have the inverse
00:05 imirkin_: mul(foo, 2) -> add(foo, foo)
00:05 karolherbst: ohhhhhhh
00:05 karolherbst: yeah
00:05 karolherbst: I see it now
00:05 karolherbst: I wasn't looking pre SSA
00:05 karolherbst: why though?
00:06 imirkin_: fadd is better than fmul?
00:06 karolherbst: add faster than mul?
00:06 karolherbst: mhh
00:06 karolherbst: how does shift compare to fadd?
00:06 imirkin_: and imul is slow too iirc
00:06 karolherbst: uhm
00:06 imirkin_: well, shift won't really work for floats :)
00:06 karolherbst: yeah
00:06 karolherbst: shift vs iadd
00:06 imirkin_: same i'd think
00:07 karolherbst: okay, so we could add a special case to do mul(foo, 2) -> shl(foo, 1) for ints and nice constants?
00:07 imirkin_: sounds like you're micro-optimizing
00:07 karolherbst: well
00:07 karolherbst: I end up with two reads from c0[0x50] :)
00:07 imirkin_: we already do mul(pot) -> shl
00:07 imirkin_: but perhaps it loses out to the special "2" case.
00:07 imirkin_: you can exclude integers form it
00:07 imirkin_: from*
00:08 karolherbst: ahh
00:08 karolherbst: mul(a, 2) -> add is the special case
00:09 imirkin_: yes.
00:09 karolherbst: I guess this runs before mul(pot) -> shl
00:09 imirkin_: likely.
00:09 karolherbst: yeah
00:09 karolherbst: the shl case is right below :)
00:09 karolherbst: maybe just reorder?
00:09 imirkin_: wtvr
00:09 imirkin_: or add isFloatType()
00:10 karolherbst: well the shl case already does that
00:10 karolherbst: I am just wondering if we should prefer shl over add even for 2
00:10 karolherbst: because a shl(a, 1) is much nicier than an add(a, a)
00:10 karolherbst: or not?
00:11 imirkin_: i mean ... add && isFloatType to the == 2 case
00:11 imirkin_: but like i said, wtvr
00:11 karolherbst: mhh
00:12 karolherbst: yeah, wtvr
00:34 karolherbst: imirkin_: I actually got some hurt shaders, all inside those dolphin ubershaders. It's where we can eliminate/merge adds/shifts/ands by having a smarter order
00:34 imirkin_: coudl teach the other opt to be smarter
00:34 karolherbst: yeah
00:35 karolherbst: but... and(add(shl(a, 4), 0xffffffe0), 0xfffffff0)
00:35 karolherbst: the and is pretty useless here
00:35 karolherbst: just writing that opt might be annoying
00:36 imirkin_: hard to notice that.
00:36 imirkin_: you could distribute it
00:36 imirkin_: but that's not generally legal
00:36 karolherbst: yeah
00:36 karolherbst: but then we have a different thing
00:37 karolherbst: add(shl(add(shl(a, 1), 0xfffffffe), 4), 0xffffffe0) parts are from above
00:37 HdkR: Dolphin always causing issues :)
00:37 imirkin_: not really
00:37 imirkin_: just shitty code
00:38 imirkin_: crap in, crap out
00:38 imirkin_: i wouldn't worry about an extra op here and there
00:38 imirkin_: esp if it's something innocuous like a mov
00:38 karolherbst: I guess we coud write an opt for shl(add(shl(a, b), c), d) -> add(shl(a, b + d), c << b)
00:38 karolherbst: or something
00:38 karolherbst: but again...
00:39 karolherbst: really hard to actually bring me to care enough about it
00:39 imirkin_: :)
00:39 imirkin_: welcome to my world
00:41 karolherbst: all I wanted was to fix some silly bindless issues :D
00:41 imirkin_: welcome to my world :)
00:41 imirkin_: you pull on a thread ...
00:41 imirkin_: and the whole ball starts to unravel
00:41 karolherbst: mhh but this one is really annoying. Image handle inside struct and I don't get the image type....
00:41 karolherbst: so I do 1D allthough I have to do a 2D
00:42 imirkin_: heh
00:42 karolherbst: allthough actually
00:42 karolherbst: I have the 2D
00:42 imirkin_: 1d might not be the safest default ;)
00:42 karolherbst: so I fixed that
00:42 karolherbst: but
00:42 karolherbst: one of the coors is a nop
00:42 imirkin_: really you should assert
00:42 karolherbst: I would
00:42 karolherbst: but
00:42 karolherbst: nir always gives 4 components :)
00:43 imirkin_: i mean assert when you don't know the image type
00:43 HdkR: Time to switch over to LLVM to get the most pristine of output :P
00:43 karolherbst: ohh I know it, I was just not following the deref correctly
00:43 karolherbst: duh
00:43 karolherbst: nir bug
00:45 karolherbst: "(expression ivec2 f2i (swiz xy (var_ref gl_FragCoord) )) (var_ref color) )" obvious that something wants to use two coords here, no?
00:46 karolherbst: in nir I get :vec4 ssa_5, ssa_1, ssa_1, ssa_1" (ssa_5: gl_FragCoord.x ssa_1: undefined)
00:46 karolherbst: *sigh*
00:46 imirkin_: swiz xy == xyyy
00:46 imirkin_: although undefined for gl_FragCoord.y seems odd :)
00:47 karolherbst: yeah
00:47 karolherbst: nir things it is a 1d image :)
00:47 karolherbst: *thinks
00:47 imirkin_: oh, so that gets dce'd
00:47 imirkin_: welp, good luck!
00:47 imirkin_: probably just mis-attachign something
00:49 karolherbst: it doesn't even get dce
00:49 karolherbst: 'd
00:49 karolherbst: it is the actualy nir input
00:49 karolherbst: *actual
00:49 karolherbst: glsl_to_nir should be the culprit
03:34 imirkin: karolherbst: where does that fs-gatheroffset-uniform-offset.frag test even come from? i don't see it in piglit
03:34 imirkin: ohhh, gatherOffset
03:34 imirkin: gr
04:06 imirkin: karolherbst: also, any objects to my v2 patch "nvc0: restore image binding on RGB10A2, remove from BGR10A2"
04:06 imirkin: (you tested the v1)
08:26 vedranm: imirkin: if you remember, modprobe -r nouveau that was broken on 4.14 on the particular machine seems to work in 4.15
08:54 mastermind193_: i am occupied with jurisdict... stuff, however after some months i can offer my helping hand on cracking the crypto of powering firmware, whatever that currently is!
08:57 mastermind193_: it is what i understood, whoever karolherbst is or what his acheivements are, he has some trouble of cracking it, i tried to help with theory but i was not that accurate than, theoretically it is very easy task
08:57 mastermind193_: i am more supporter of this open source branch of the drivers in theory, as i like them more, but nouveau especially has slight work to be yet done
14:04 karolherbst: okay nice, I am pretty much done with the nir thing. +2/-4 passes in full piglit run
14:06 karolherbst: pmoreau: today I will rebase the OpenCL stuff
14:09 pmoreau: Cool! I’ll have absolutely no time to play/test it before the weekend, and possibly even up to the 16th of April.
14:10 pmoreau: karolherbst: I brought my old Radeon HD 6870 back from home, so I should be able to work on clover while not touching Nouveau. :-D
14:10 karolherbst: :D
15:14 karolherbst: pmoreau: done rebasing nouveau_nir_spirv_opencl_v3 :)
15:24 pendingchaos: imirkin: have you looked at patch 4 of the 4th version of the conservative rasterization patches?
15:27 pmoreau: karolherbst: Perfect, thanks!
15:27 karolherbst: pmoreau: test_basic: FAILED 21 of 84 tests :)
15:28 pmoreau: :-) What’s the biggest category of fails?
15:29 karolherbst: non global memory
15:29 karolherbst: and work offsets
15:29 karolherbst: generic pointers as well
15:30 karolherbst: tests with images are just passing because non supported though
15:30 pmoreau: :-D
15:30 pmoreau: Okay, so that’s adding another ~20 failed tests then :-p
15:30 karolherbst: yeah well. i think I would work on images next or try to fix those tiny issues
15:31 karolherbst: I doubt it will be difficult though
15:31 karolherbst: should be quite easy
15:47 imirkin_: pendingchaos: i think i glanced at it and it seemed fine. i'll need to look more carefully. is everything else reviewed? if so, i'll do it tonight and push
15:48 pendingchaos: patches 2 and 3 are reviewed
15:49 pendingchaos: I'm hoping to release an updated patch 1 with a few small changes after patch 4 is looked at
18:11 glisse: danvet: oh i missunderstood your first email
18:11 glisse: i am just worried the rdma folks enforce everybody to register their struct page ...
18:11 danvet: I might have been confusing
18:12 danvet: there's too many totally complicated mm vs. gpu topics floating around right now :-/
18:12 glisse: no likely a lack of coffee in my blood
18:12 danvet: nah, I always wanted dma-buf import to allow non-struct-page backed memory
18:12 danvet: because there's all kinds of funny stuff going on
18:12 danvet: stolen ranges, p2p, numa nodes the kernel doesn't know about
18:13 glisse: iirc Dan from intel tryied to introduce pfn_t in more place
18:13 glisse: which is a pfn value with flag
18:13 danvet: iirc I complained to airlied that his first ttm dma-buf importer just dug out the struct page, but sounds like it's all fixed now
18:13 glisse: that can tell you if there is a struct page or not behind
18:13 danvet: yeah I read some of the lwn summaries
18:13 danvet: imo for gpu buffers sgt is good enough
18:13 danvet: it wastes a bit of space for when you don't have a struct page around
18:13 danvet: but oh well
18:14 danvet: gets the job done at least
18:14 glisse: for HMM i want to push my dma changes
18:14 glisse: idea is that HMM fill in the iommu page table directly
18:14 danvet: otoh you can coalesce, so as long as you don't suck too bad at keeping stuff contiguous it should be fine
18:14 glisse: i should send rfc latter this week
18:14 danvet: cc: dri-devel for this stuff?
18:15 danvet: I'm totally out of the loop on all this hmm stuff :-/
18:15 glisse: yeah dri-devel mm linaro-mm
18:15 danvet: yeah I'm still subscribed to linara-mm and mm, but long stopped trying to keep up to date on those ...
18:15 glisse: linaro-mm is low traffic i think
18:15 glisse: well i merge all this in same folder
18:16 glisse: but i always had the feeling that linaro was low traffic
18:17 danvet: hm yeah
18:17 danvet: I still have it filtered even
18:17 danvet: I should read it again more regularly
18:18 danvet: once it became epic respins of CMA I kinda stopped
18:18 danvet: anyway, less stuff assuming dma_addr_t (in an sgt or somewhere else) is backed by memory with a struct page, the better imo
18:18 glisse: i wish CMA died ... like why can't soc pay for a 1cents iommu
18:18 danvet: when I spot new dma-buf importers I always try to make them only look at the dma_addr_t
18:19 danvet: or if they do have to look at the struct page, at least check it's there and fail the import if that's not the case
18:19 imirkin_: glisse: hard to change the hw that's already out there
18:19 danvet: e.g. the xen 0copy thing obviously needs a struct page for the grant hypercall
18:20 danvet: also, hw is cheap
18:20 glisse: it's not like cell phone get kernel update :)
18:20 danvet: and apparently the pte walking is too expensive for display in terms of power budget
18:21 glisse: i guess the tlb block verilog is 2 cents on ebay ;)
18:21 danvet: nah, it's the random access needed that blows up the latency budget
18:21 danvet: instead of most minimal streaming reads
18:21 danvet: so you have to refill your fifos more aggressively
18:22 danvet: wake up the memory more often
18:22 danvet: goes all downhill
18:22 danvet: hw people even here at intel regularly freak out about the display tlb fetches :-)
18:23 imirkin_: but then you pat them on the head and say "it'll be ok"
18:23 danvet: otoh we do 5 levels + iommu on each level :-)
18:23 danvet: imirkin_, judging by how much fun we have with underruns, unfortunately not :-/
18:23 imirkin_: heh
18:24 imirkin_: i guess you'd know better, but it seems unlikely that your underrun problems have to do with tlb lookup latency
18:25 imirkin_: just based on personal observation
18:25 danvet: slightly more serious: display has a dedicated pagetable with a shadow to do the iommu lookups needed at pte write time
18:25 danvet: so that pte fetching is a nice linear streaming read
18:26 danvet: but yeah, display hw folks freak out about latency all the time :-)
18:26 imirkin_: well they _really_ have to get their data
18:26 imirkin_: OR ELSE
18:26 imirkin_: all the dgpu's have it local in vram
18:27 imirkin_: i don't remember if it has to be physically contiguous for nvidia, but it might be
18:28 imirkin_: (actually i have no clue if it even goes through the MMU...)
18:47 karolherbst: pmoreau: "[TTM] Could not find buffer object to map" any idea what this is all about?
18:47 karolherbst: I am hitting this in the barrier test
18:58 pmoreau: karolherbst: No clue, sorry
21:56 Lyude: oh sweet
21:56 Lyude: mst fallback retraining stuff is mostly done and fixing up the weston tablet support series is going a lot faster then expected, so it's very likely i'll be back working on nouveau soon :)
21:57 imirkin_: neat
21:57 imirkin_: Lyude: i assume you have some "late" model (maxwell2+) gpu's sitting around ... any chance you have HDMI 2.0 sinks?
21:57 karolherbst: imirkin_: uhm we treat MS levels 0 and 1 pretty much equally insid mesa/gallium, don't we?
21:57 imirkin_: karolherbst: we do.
21:58 Lyude: imirkin_: i sure do, also do you still need that mst testing? I realized the other day I got distracted
21:58 Lyude: oh wait, hdmi 2 sinks
21:58 imirkin_: Lyude: i do.
21:58 karolherbst: imirkin_: yeah, with the new non MS CTS tests I hit an assert where we have 1 <= 0 (old level vs new level or something)
21:58 imirkin_: karolherbst: there's a LOT of confusion about it
21:58 Lyude: mind giving me the stuff you need testing with again? i'll do it now so i don't forget. also, let me look up hdmi 2 and see if I'd have anything for that around
21:58 karolherbst: yeah, I can guess
21:58 karolherbst: imirkin_: KHR-GL45.shader_image_size.advanced-nonMS-*
21:59 imirkin_: Lyude: iirc you tested and said it crashed. but i need more info than just the existence of a crash to debug :)
21:59 Lyude: imirkin_: is this also known as HDMI MHL?
21:59 karolherbst: imirkin_: with mesa master I had to remove all of those and KHR-GL45.copy_image.functional to get a full run :)
21:59 imirkin_: Lyude: https://github.com/imirkin/xf86-video-nouveau
21:59 imirkin_: that's the ddx with the patch to handle DP-MST "stuff"
22:00 karolherbst: imirkin_: did you make changes since last time I tested it?
22:00 imirkin_: Lyude: hdmi 2.0 adds a bunch of things, i'm sure. biggest one is higher frequencies.
22:00 imirkin_: karolherbst: i did not
22:00 karolherbst: ahh
22:00 imirkin_: karolherbst: i don't think you provided me with much to go on
22:00 karolherbst: true
22:00 karolherbst: I could probably get you more information next week
22:01 karolherbst: now that I also have my desktop and various GPUs
22:01 karolherbst: ohh wait, no MST there
22:01 plutoo: does compute class registers overlap with 3d class?
22:01 imirkin_: Lyude: basically you need hdmi 2.0 for 4k@60
22:01 karolherbst: *sigh*
22:01 imirkin_: plutoo: class methods you mean? yes, sometimes. not always.
22:01 imirkin_: karolherbst: it looks like between all of us, we have the requisite equipment
22:02 imirkin_: but unfortunately, the cables aren't long enough :)
22:02 karolherbst: well, how does nvidia do the display over wifi thing?
22:02 karolherbst: should work through the internet as well, no?
22:02 imirkin_: plutoo: each class is a totally different API though. any similarities should be considered coincidences.
22:02 karolherbst: allthough I guess this is pure software
22:03 imirkin_: display-over-metro-ethernet
22:04 HdkR: karolherbst: GFN? Swapchain interception
22:04 karolherbst: "in my days"-tm you would have used hamachi for that
22:04 HdkR: or Gamestream. Same tech really
22:05 karolherbst: you could alos just create your own VPN :p
22:05 imirkin_: "in my day" (tm), we had 9600 baud dialup :p
22:05 karolherbst: fun
22:05 karolherbst: my 35 years older unclue was talking about those times :p
22:06 HdkR: pfft, with my 5mbit internet a personal gamestream path over the internet wouldn't work
22:06 karolherbst: HdkR: down?
22:06 HdkR: US internet woo
22:06 karolherbst: that's hardcore
22:06 HdkR: 5mbit up, 150mbit down
22:06 imirkin_: HdkR: i have gbit =]
22:06 plutoo: did you ever encounter official register names
22:06 Lyude: imirkin_: then I definitely should have something around here for tah
22:06 plutoo: if they forgot to strip some binary, etc...
22:06 HdkR: imirkin_: I'm getting gigabit down....35mbit up in two weeks
22:07 karolherbst: HdkR: well cable is shit everywhere
22:07 karolherbst: HdkR: huh?
22:07 imirkin_: HdkR: i haven't really tried maxing the upstream, but i've definitely gotten like 100MB/s down. fun to look at.
22:07 karolherbst: fiber with crappy up? what's that
22:07 HdkR: It's fiber to the....node? outside of building? Something like that
22:07 HdkR: Fiber runs to the node for the apartment complex, then coaxial to each apartment
22:07 imirkin_: i have fiber in my apt.
22:07 karolherbst: huhhuu :(
22:07 imirkin_: yay fios
22:08 HdkR: Silicon valley, home of the blarg
22:08 karolherbst: I should get fiber at home too, but I don't think the czech republic is that far though
22:08 HdkR: At least I can mooch off the fiber at work
22:09 karolherbst: and then there are those calling cable fiber...
22:09 karolherbst: super annoying
22:10 mooch: HdkR, u called?
22:11 HdkR: hah
22:36 Lyude: hm, that's rather strange
22:37 Lyude: imirkin_: got your mst stuff running right now, it looks like the displays almost come up but the screens just stay blank
22:37 imirkin_: erm
22:37 imirkin_: have you plugged unplugged?
22:37 imirkin_: i was promised crashes
22:37 imirkin_: (are you 100% sure you're running with my patch?)
22:38 Lyude: let me double check, there are definitely crashes on unplug
22:38 imirkin_: my patches shouldn't affect displays coming up
22:46 Lyude: imirkin_: yeah, triple confirmed i'm definitely running your version of the driver
22:46 imirkin_: and the screens don't come up?
22:47 imirkin_: what if you go to my repo master HEAD^
22:47 imirkin_: i.e. the commit which "does stuff"
22:47 imirkin_: does it work then?
22:48 Lyude: not really, no, but it gets close. I see all fo the displays come up without anything on any of the fbs
22:48 Lyude: then a moment later it dies off
22:48 Lyude: wonder if it's got something to do with how this mst hub interacts with nouveau
22:48 imirkin_: and does it work with modesetting?
22:48 imirkin_: i.e. xf86-video-modesetting
22:49 Lyude: yeah, works fine with modesetting
22:49 imirkin_: wtf
22:49 imirkin_: how would any of that be at all different
22:49 Lyude: something else is fishy here
22:49 imirkin_: mmmmmmm fish
22:49 Lyude: so like; the behavior I'm seeing is basicall... ok now i'm really, really confused
22:50 Lyude: so i just reverted to fedora's version of the ddx and mst works
22:50 imirkin_: ls -l /dev/dri
22:50 Lyude: resize called 1920 1080
22:50 Lyude: resize called 5760 1080
22:50 Lyude: ...oops, wrong one
22:50 Lyude: https://paste.fedoraproject.org/paste/0kQoXEI1P~y-hRndwVj5Uw
22:51 imirkin_: hm ok. you won't get hit by this other issue
22:51 Lyude: the only difference I can think of from last time is that I've got an hdmi display hooked up as well
22:51 imirkin_: which randomly kills dri3 for no reason
22:51 imirkin_: (iirc i pushed a patch to nuke it)
22:51 imirkin_: (but it's not on that branch)
22:51 imirkin_: iirc the fedora nouveau doesn't load for pre-nv50
22:51 Lyude: yeah; i've got it manually enabled
22:51 imirkin_: ah ok
22:52 imirkin_: well, i'd greatly appreciate it if you could spend like 30 mins at some point figuring out wtf is up
22:53 Lyude: hm, I might see what's going on here, it's just because if you start x with the non-mst ddx and have the mst display hooked up at the start it works, which makes sens
22:53 imirkin_: unfortunately i have neither DP-1.2-capable nvidia sources, nor sinks where my nvidia boards are located
22:53 imirkin_: yeah
22:53 imirkin_: the mst patches just handle hotplug
22:53 imirkin_: and the TILE property and such
22:53 Lyude: oooh hold on, there it goes
22:54 imirkin_: 6th time is a charm, just like for vinny?
22:54 Lyude: wait, hold on
22:54 Lyude: i'm dumb. it's been loading the nvidia driver this whole time
22:54 Lyude: ugh.
22:54 Lyude:redoes the thing
22:54 Lyude: that's bizarre, seeing as nouveau's ddx still managed to load
22:54 imirkin_: https://www.youtube.com/watch?v=yOFGhnr4rto
22:56 Lyude: wait, no nouveau was loaded but i guess nvidia-nvlink got loaded too? i'm going to retry this just to confirm
22:58 Lyude: yeah, I did have things set up before, the behavior is the same
22:58 Lyude: imirkin_: https://paste.fedoraproject.org/paste/7YkAobgLM89-cIPGM6l0Zw
22:59 imirkin_: ok so
22:59 imirkin_: want to understand the setup
22:59 imirkin_: you have hdmi screen plugged in
22:59 imirkin_: you start X
22:59 imirkin_: *then* you start plugging DP screens
22:59 imirkin_: yes?
23:00 Lyude: correct
23:00 imirkin_: ok. and something is seeing those X connectors show up
23:01 imirkin_: and is trying to actually display something
23:01 imirkin_: but then fail
23:01 imirkin_: ok
23:01 imirkin_: i will look closelier.
23:01 imirkin_: can you get symbols for the crash on unplug?
23:01 Lyude: sure thing
23:02 imirkin_: also, anything in dmesg (like "link training failed"?
23:02 imirkin_: or other harbingers of death
23:03 Lyude: nope, nothing...
23:04 Lyude: https://paste.fedoraproject.org/paste/h5FB0XpG-wK7HaUdfOdK9g backtrace
23:04 Lyude: save that somewhere, fpaste will expire it
23:05 Lyude: worse comes to worse, if you wait until I move I can probably set you up with an mst hub + chamelium, with the hub on a power cutter so you can power cycle it
23:05 Lyude: that's usually what I use for working on mst with machines I'm not in front of
23:06 imirkin_: if you still have it up, do a bt full?
23:06 imirkin_: or at least "i locals"
23:06 Lyude: https://paste.fedoraproject.org/paste/wAholszm9pd5SxtPECMULg
23:07 imirkin_: excellent thanks!
23:07 Lyude: i can do a recompile with no optimization as well if that'd help
23:07 imirkin_: nope
23:07 imirkin_: koutput = 0x0
23:07 Lyude: ahh, lol
23:07 imirkin_: which makes for sadness when doing koutput->count_props
23:08 imirkin_: i don't remember any of that code, so will have to revisit it
23:08 imirkin_: thanks a lot for the info!
23:08 Lyude: np! let me know if you need any more help
23:17 imirkin_: Lyude: well, i think i'm the last person who's interested in xf86-video-nouveau
23:17 imirkin_: at least of the developers ... RH isn't going to put any effort toward it
23:18 imirkin_: and i'm in the unfortunate position that i don't actually have access to the hw to test it all out -- so i very much appreciate your testing :)
23:21 imirkin_: [basically i'd need kepler+ with DP and DP screens in the same place... i have DP screens at work, and no kepler+ DP boards anywhere]
23:21 imirkin_: i should probably try to get a K600 or something
23:23 imirkin_: hrmph. $25 on ebay.