00:03karolherbst: imirkin_: add(c0[A], c0[A]) -> mul(c0[A], 2) or shl(c0[A], 1)
00:03karolherbst: but dealing with those memory locations is always messy when writing ops :(
00:04karolherbst: well when you read twice from the same region and add those up, you could also just read once and mul/shift ;)
00:04imirkin_: i thought we did that
00:04imirkin_: oh wait
00:04karolherbst: apperently not that good
00:05imirkin_: we have the inverse
00:05imirkin_: mul(foo, 2) -> add(foo, foo)
00:05karolherbst: I see it now
00:05karolherbst: I wasn't looking pre SSA
00:05karolherbst: why though?
00:06imirkin_: fadd is better than fmul?
00:06karolherbst: add faster than mul?
00:06karolherbst: how does shift compare to fadd?
00:06imirkin_: and imul is slow too iirc
00:06imirkin_: well, shift won't really work for floats :)
00:06karolherbst: shift vs iadd
00:06imirkin_: same i'd think
00:07karolherbst: okay, so we could add a special case to do mul(foo, 2) -> shl(foo, 1) for ints and nice constants?
00:07imirkin_: sounds like you're micro-optimizing
00:07karolherbst: I end up with two reads from c0[0x50] :)
00:07imirkin_: we already do mul(pot) -> shl
00:07imirkin_: but perhaps it loses out to the special "2" case.
00:07imirkin_: you can exclude integers form it
00:08karolherbst: mul(a, 2) -> add is the special case
00:09karolherbst: I guess this runs before mul(pot) -> shl
00:09karolherbst: the shl case is right below :)
00:09karolherbst: maybe just reorder?
00:09imirkin_: or add isFloatType()
00:10karolherbst: well the shl case already does that
00:10karolherbst: I am just wondering if we should prefer shl over add even for 2
00:10karolherbst: because a shl(a, 1) is much nicier than an add(a, a)
00:10karolherbst: or not?
00:11imirkin_: i mean ... add && isFloatType to the == 2 case
00:11imirkin_: but like i said, wtvr
00:12karolherbst: yeah, wtvr
00:34karolherbst: imirkin_: I actually got some hurt shaders, all inside those dolphin ubershaders. It's where we can eliminate/merge adds/shifts/ands by having a smarter order
00:34imirkin_: coudl teach the other opt to be smarter
00:35karolherbst: but... and(add(shl(a, 4), 0xffffffe0), 0xfffffff0)
00:35karolherbst: the and is pretty useless here
00:35karolherbst: just writing that opt might be annoying
00:36imirkin_: hard to notice that.
00:36imirkin_: you could distribute it
00:36imirkin_: but that's not generally legal
00:36karolherbst: but then we have a different thing
00:37karolherbst: add(shl(add(shl(a, 1), 0xfffffffe), 4), 0xffffffe0) parts are from above
00:37HdkR: Dolphin always causing issues :)
00:37imirkin_: not really
00:37imirkin_: just shitty code
00:38imirkin_: crap in, crap out
00:38imirkin_: i wouldn't worry about an extra op here and there
00:38imirkin_: esp if it's something innocuous like a mov
00:38karolherbst: I guess we coud write an opt for shl(add(shl(a, b), c), d) -> add(shl(a, b + d), c << b)
00:38karolherbst: or something
00:38karolherbst: but again...
00:39karolherbst: really hard to actually bring me to care enough about it
00:39imirkin_: welcome to my world
00:41karolherbst: all I wanted was to fix some silly bindless issues :D
00:41imirkin_: welcome to my world :)
00:41imirkin_: you pull on a thread ...
00:41imirkin_: and the whole ball starts to unravel
00:41karolherbst: mhh but this one is really annoying. Image handle inside struct and I don't get the image type....
00:41karolherbst: so I do 1D allthough I have to do a 2D
00:42karolherbst: allthough actually
00:42karolherbst: I have the 2D
00:42imirkin_: 1d might not be the safest default ;)
00:42karolherbst: so I fixed that
00:42karolherbst: one of the coors is a nop
00:42imirkin_: really you should assert
00:42karolherbst: I would
00:42karolherbst: nir always gives 4 components :)
00:43imirkin_: i mean assert when you don't know the image type
00:43HdkR: Time to switch over to LLVM to get the most pristine of output :P
00:43karolherbst: ohh I know it, I was just not following the deref correctly
00:43karolherbst: nir bug
00:45karolherbst: "(expression ivec2 f2i (swiz xy (var_ref gl_FragCoord) )) (var_ref color) )" obvious that something wants to use two coords here, no?
00:46karolherbst: in nir I get :vec4 ssa_5, ssa_1, ssa_1, ssa_1" (ssa_5: gl_FragCoord.x ssa_1: undefined)
00:46imirkin_: swiz xy == xyyy
00:46imirkin_: although undefined for gl_FragCoord.y seems odd :)
00:47karolherbst: nir things it is a 1d image :)
00:47imirkin_: oh, so that gets dce'd
00:47imirkin_: welp, good luck!
00:47imirkin_: probably just mis-attachign something
00:49karolherbst: it doesn't even get dce
00:49karolherbst: it is the actualy nir input
00:49karolherbst: glsl_to_nir should be the culprit
03:34imirkin: karolherbst: where does that fs-gatheroffset-uniform-offset.frag test even come from? i don't see it in piglit
03:34imirkin: ohhh, gatherOffset
04:06imirkin: karolherbst: also, any objects to my v2 patch "nvc0: restore image binding on RGB10A2, remove from BGR10A2"
04:06imirkin: (you tested the v1)
08:26vedranm: imirkin: if you remember, modprobe -r nouveau that was broken on 4.14 on the particular machine seems to work in 4.15
08:54mastermind193_: i am occupied with jurisdict... stuff, however after some months i can offer my helping hand on cracking the crypto of powering firmware, whatever that currently is!
08:57mastermind193_: it is what i understood, whoever karolherbst is or what his acheivements are, he has some trouble of cracking it, i tried to help with theory but i was not that accurate than, theoretically it is very easy task
08:57mastermind193_: i am more supporter of this open source branch of the drivers in theory, as i like them more, but nouveau especially has slight work to be yet done
14:04karolherbst: okay nice, I am pretty much done with the nir thing. +2/-4 passes in full piglit run
14:06karolherbst: pmoreau: today I will rebase the OpenCL stuff
14:09pmoreau: Cool! I’ll have absolutely no time to play/test it before the weekend, and possibly even up to the 16th of April.
14:10pmoreau: karolherbst: I brought my old Radeon HD 6870 back from home, so I should be able to work on clover while not touching Nouveau. :-D
15:14karolherbst: pmoreau: done rebasing nouveau_nir_spirv_opencl_v3 :)
15:24pendingchaos: imirkin: have you looked at patch 4 of the 4th version of the conservative rasterization patches?
15:27pmoreau: karolherbst: Perfect, thanks!
15:27karolherbst: pmoreau: test_basic: FAILED 21 of 84 tests :)
15:28pmoreau: :-) What’s the biggest category of fails?
15:29karolherbst: non global memory
15:29karolherbst: and work offsets
15:29karolherbst: generic pointers as well
15:30karolherbst: tests with images are just passing because non supported though
15:30pmoreau: Okay, so that’s adding another ~20 failed tests then :-p
15:30karolherbst: yeah well. i think I would work on images next or try to fix those tiny issues
15:31karolherbst: I doubt it will be difficult though
15:31karolherbst: should be quite easy
15:47imirkin_: pendingchaos: i think i glanced at it and it seemed fine. i'll need to look more carefully. is everything else reviewed? if so, i'll do it tonight and push
15:48pendingchaos: patches 2 and 3 are reviewed
15:49pendingchaos: I'm hoping to release an updated patch 1 with a few small changes after patch 4 is looked at
18:11glisse: danvet: oh i missunderstood your first email
18:11glisse: i am just worried the rdma folks enforce everybody to register their struct page ...
18:11danvet: I might have been confusing
18:12danvet: there's too many totally complicated mm vs. gpu topics floating around right now :-/
18:12glisse: no likely a lack of coffee in my blood
18:12danvet: nah, I always wanted dma-buf import to allow non-struct-page backed memory
18:12danvet: because there's all kinds of funny stuff going on
18:12danvet: stolen ranges, p2p, numa nodes the kernel doesn't know about
18:13glisse: iirc Dan from intel tryied to introduce pfn_t in more place
18:13glisse: which is a pfn value with flag
18:13danvet: iirc I complained to airlied that his first ttm dma-buf importer just dug out the struct page, but sounds like it's all fixed now
18:13glisse: that can tell you if there is a struct page or not behind
18:13danvet: yeah I read some of the lwn summaries
18:13danvet: imo for gpu buffers sgt is good enough
18:13danvet: it wastes a bit of space for when you don't have a struct page around
18:13danvet: but oh well
18:14danvet: gets the job done at least
18:14glisse: for HMM i want to push my dma changes
18:14glisse: idea is that HMM fill in the iommu page table directly
18:14danvet: otoh you can coalesce, so as long as you don't suck too bad at keeping stuff contiguous it should be fine
18:14glisse: i should send rfc latter this week
18:14danvet: cc: dri-devel for this stuff?
18:15danvet: I'm totally out of the loop on all this hmm stuff :-/
18:15glisse: yeah dri-devel mm linaro-mm
18:15danvet: yeah I'm still subscribed to linara-mm and mm, but long stopped trying to keep up to date on those ...
18:15glisse: linaro-mm is low traffic i think
18:15glisse: well i merge all this in same folder
18:16glisse: but i always had the feeling that linaro was low traffic
18:17danvet: hm yeah
18:17danvet: I still have it filtered even
18:17danvet: I should read it again more regularly
18:18danvet: once it became epic respins of CMA I kinda stopped
18:18danvet: anyway, less stuff assuming dma_addr_t (in an sgt or somewhere else) is backed by memory with a struct page, the better imo
18:18glisse: i wish CMA died ... like why can't soc pay for a 1cents iommu
18:18danvet: when I spot new dma-buf importers I always try to make them only look at the dma_addr_t
18:19danvet: or if they do have to look at the struct page, at least check it's there and fail the import if that's not the case
18:19imirkin_: glisse: hard to change the hw that's already out there
18:19danvet: e.g. the xen 0copy thing obviously needs a struct page for the grant hypercall
18:20danvet: also, hw is cheap
18:20glisse: it's not like cell phone get kernel update :)
18:20danvet: and apparently the pte walking is too expensive for display in terms of power budget
18:21glisse: i guess the tlb block verilog is 2 cents on ebay ;)
18:21danvet: nah, it's the random access needed that blows up the latency budget
18:21danvet: instead of most minimal streaming reads
18:21danvet: so you have to refill your fifos more aggressively
18:22danvet: wake up the memory more often
18:22danvet: goes all downhill
18:22danvet: hw people even here at intel regularly freak out about the display tlb fetches :-)
18:23imirkin_: but then you pat them on the head and say "it'll be ok"
18:23danvet: otoh we do 5 levels + iommu on each level :-)
18:23danvet: imirkin_, judging by how much fun we have with underruns, unfortunately not :-/
18:24imirkin_: i guess you'd know better, but it seems unlikely that your underrun problems have to do with tlb lookup latency
18:25imirkin_: just based on personal observation
18:25danvet: slightly more serious: display has a dedicated pagetable with a shadow to do the iommu lookups needed at pte write time
18:25danvet: so that pte fetching is a nice linear streaming read
18:26danvet: but yeah, display hw folks freak out about latency all the time :-)
18:26imirkin_: well they _really_ have to get their data
18:26imirkin_: OR ELSE
18:26imirkin_: all the dgpu's have it local in vram
18:27imirkin_: i don't remember if it has to be physically contiguous for nvidia, but it might be
18:28imirkin_: (actually i have no clue if it even goes through the MMU...)
18:47karolherbst: pmoreau: "[TTM] Could not find buffer object to map" any idea what this is all about?
18:47karolherbst: I am hitting this in the barrier test
18:58pmoreau: karolherbst: No clue, sorry
21:56Lyude: oh sweet
21:56Lyude: mst fallback retraining stuff is mostly done and fixing up the weston tablet support series is going a lot faster then expected, so it's very likely i'll be back working on nouveau soon :)
21:57imirkin_: Lyude: i assume you have some "late" model (maxwell2+) gpu's sitting around ... any chance you have HDMI 2.0 sinks?
21:57karolherbst: imirkin_: uhm we treat MS levels 0 and 1 pretty much equally insid mesa/gallium, don't we?
21:57imirkin_: karolherbst: we do.
21:58Lyude: imirkin_: i sure do, also do you still need that mst testing? I realized the other day I got distracted
21:58Lyude: oh wait, hdmi 2 sinks
21:58imirkin_: Lyude: i do.
21:58karolherbst: imirkin_: yeah, with the new non MS CTS tests I hit an assert where we have 1 <= 0 (old level vs new level or something)
21:58imirkin_: karolherbst: there's a LOT of confusion about it
21:58Lyude: mind giving me the stuff you need testing with again? i'll do it now so i don't forget. also, let me look up hdmi 2 and see if I'd have anything for that around
21:58karolherbst: yeah, I can guess
21:58karolherbst: imirkin_: KHR-GL45.shader_image_size.advanced-nonMS-*
21:59imirkin_: Lyude: iirc you tested and said it crashed. but i need more info than just the existence of a crash to debug :)
21:59Lyude: imirkin_: is this also known as HDMI MHL?
21:59karolherbst: imirkin_: with mesa master I had to remove all of those and KHR-GL45.copy_image.functional to get a full run :)
21:59imirkin_: Lyude: https://github.com/imirkin/xf86-video-nouveau
21:59imirkin_: that's the ddx with the patch to handle DP-MST "stuff"
22:00karolherbst: imirkin_: did you make changes since last time I tested it?
22:00imirkin_: Lyude: hdmi 2.0 adds a bunch of things, i'm sure. biggest one is higher frequencies.
22:00imirkin_: karolherbst: i did not
22:00imirkin_: karolherbst: i don't think you provided me with much to go on
22:00karolherbst: I could probably get you more information next week
22:01karolherbst: now that I also have my desktop and various GPUs
22:01karolherbst: ohh wait, no MST there
22:01plutoo: does compute class registers overlap with 3d class?
22:01imirkin_: Lyude: basically you need hdmi 2.0 for 4k@60
22:01imirkin_: plutoo: class methods you mean? yes, sometimes. not always.
22:01imirkin_: karolherbst: it looks like between all of us, we have the requisite equipment
22:02imirkin_: but unfortunately, the cables aren't long enough :)
22:02karolherbst: well, how does nvidia do the display over wifi thing?
22:02karolherbst: should work through the internet as well, no?
22:02imirkin_: plutoo: each class is a totally different API though. any similarities should be considered coincidences.
22:02karolherbst: allthough I guess this is pure software
22:04HdkR: karolherbst: GFN? Swapchain interception
22:04karolherbst: "in my days"-tm you would have used hamachi for that
22:04HdkR: or Gamestream. Same tech really
22:05karolherbst: you could alos just create your own VPN :p
22:05imirkin_: "in my day" (tm), we had 9600 baud dialup :p
22:05karolherbst: my 35 years older unclue was talking about those times :p
22:06HdkR: pfft, with my 5mbit internet a personal gamestream path over the internet wouldn't work
22:06karolherbst: HdkR: down?
22:06HdkR: US internet woo
22:06karolherbst: that's hardcore
22:06HdkR: 5mbit up, 150mbit down
22:06imirkin_: HdkR: i have gbit =]
22:06plutoo: did you ever encounter official register names
22:06Lyude: imirkin_: then I definitely should have something around here for tah
22:06plutoo: if they forgot to strip some binary, etc...
22:06HdkR: imirkin_: I'm getting gigabit down....35mbit up in two weeks
22:07karolherbst: HdkR: well cable is shit everywhere
22:07karolherbst: HdkR: huh?
22:07imirkin_: HdkR: i haven't really tried maxing the upstream, but i've definitely gotten like 100MB/s down. fun to look at.
22:07karolherbst: fiber with crappy up? what's that
22:07HdkR: It's fiber to the....node? outside of building? Something like that
22:07HdkR: Fiber runs to the node for the apartment complex, then coaxial to each apartment
22:07imirkin_: i have fiber in my apt.
22:07karolherbst: huhhuu :(
22:07imirkin_: yay fios
22:08HdkR: Silicon valley, home of the blarg
22:08karolherbst: I should get fiber at home too, but I don't think the czech republic is that far though
22:08HdkR: At least I can mooch off the fiber at work
22:09karolherbst: and then there are those calling cable fiber...
22:09karolherbst: super annoying
22:10mooch: HdkR, u called?
22:36Lyude: hm, that's rather strange
22:37Lyude: imirkin_: got your mst stuff running right now, it looks like the displays almost come up but the screens just stay blank
22:37imirkin_: have you plugged unplugged?
22:37imirkin_: i was promised crashes
22:37imirkin_: (are you 100% sure you're running with my patch?)
22:38Lyude: let me double check, there are definitely crashes on unplug
22:38imirkin_: my patches shouldn't affect displays coming up
22:46Lyude: imirkin_: yeah, triple confirmed i'm definitely running your version of the driver
22:46imirkin_: and the screens don't come up?
22:47imirkin_: what if you go to my repo master HEAD^
22:47imirkin_: i.e. the commit which "does stuff"
22:47imirkin_: does it work then?
22:48Lyude: not really, no, but it gets close. I see all fo the displays come up without anything on any of the fbs
22:48Lyude: then a moment later it dies off
22:48Lyude: wonder if it's got something to do with how this mst hub interacts with nouveau
22:48imirkin_: and does it work with modesetting?
22:48imirkin_: i.e. xf86-video-modesetting
22:49Lyude: yeah, works fine with modesetting
22:49imirkin_: how would any of that be at all different
22:49Lyude: something else is fishy here
22:49imirkin_: mmmmmmm fish
22:49Lyude: so like; the behavior I'm seeing is basicall... ok now i'm really, really confused
22:50Lyude: so i just reverted to fedora's version of the ddx and mst works
22:50imirkin_: ls -l /dev/dri
22:50Lyude: resize called 1920 1080
22:50Lyude: resize called 5760 1080
22:50Lyude: ...oops, wrong one
22:51imirkin_: hm ok. you won't get hit by this other issue
22:51Lyude: the only difference I can think of from last time is that I've got an hdmi display hooked up as well
22:51imirkin_: which randomly kills dri3 for no reason
22:51imirkin_: (iirc i pushed a patch to nuke it)
22:51imirkin_: (but it's not on that branch)
22:51imirkin_: iirc the fedora nouveau doesn't load for pre-nv50
22:51Lyude: yeah; i've got it manually enabled
22:51imirkin_: ah ok
22:52imirkin_: well, i'd greatly appreciate it if you could spend like 30 mins at some point figuring out wtf is up
22:53Lyude: hm, I might see what's going on here, it's just because if you start x with the non-mst ddx and have the mst display hooked up at the start it works, which makes sens
22:53imirkin_: unfortunately i have neither DP-1.2-capable nvidia sources, nor sinks where my nvidia boards are located
22:53imirkin_: the mst patches just handle hotplug
22:53imirkin_: and the TILE property and such
22:53Lyude: oooh hold on, there it goes
22:54imirkin_: 6th time is a charm, just like for vinny?
22:54Lyude: wait, hold on
22:54Lyude: i'm dumb. it's been loading the nvidia driver this whole time
22:54Lyude:redoes the thing
22:54Lyude: that's bizarre, seeing as nouveau's ddx still managed to load
22:56Lyude: wait, no nouveau was loaded but i guess nvidia-nvlink got loaded too? i'm going to retry this just to confirm
22:58Lyude: yeah, I did have things set up before, the behavior is the same
22:58Lyude: imirkin_: https://paste.fedoraproject.org/paste/7YkAobgLM89-cIPGM6l0Zw
22:59imirkin_: ok so
22:59imirkin_: want to understand the setup
22:59imirkin_: you have hdmi screen plugged in
22:59imirkin_: you start X
22:59imirkin_: *then* you start plugging DP screens
23:00imirkin_: ok. and something is seeing those X connectors show up
23:01imirkin_: and is trying to actually display something
23:01imirkin_: but then fail
23:01imirkin_: i will look closelier.
23:01imirkin_: can you get symbols for the crash on unplug?
23:01Lyude: sure thing
23:02imirkin_: also, anything in dmesg (like "link training failed"?
23:02imirkin_: or other harbingers of death
23:03Lyude: nope, nothing...
23:04Lyude: https://paste.fedoraproject.org/paste/h5FB0XpG-wK7HaUdfOdK9g backtrace
23:04Lyude: save that somewhere, fpaste will expire it
23:05Lyude: worse comes to worse, if you wait until I move I can probably set you up with an mst hub + chamelium, with the hub on a power cutter so you can power cycle it
23:05Lyude: that's usually what I use for working on mst with machines I'm not in front of
23:06imirkin_: if you still have it up, do a bt full?
23:06imirkin_: or at least "i locals"
23:07imirkin_: excellent thanks!
23:07Lyude: i can do a recompile with no optimization as well if that'd help
23:07imirkin_: koutput = 0x0
23:07Lyude: ahh, lol
23:07imirkin_: which makes for sadness when doing koutput->count_props
23:08imirkin_: i don't remember any of that code, so will have to revisit it
23:08imirkin_: thanks a lot for the info!
23:08Lyude: np! let me know if you need any more help
23:17imirkin_: Lyude: well, i think i'm the last person who's interested in xf86-video-nouveau
23:17imirkin_: at least of the developers ... RH isn't going to put any effort toward it
23:18imirkin_: and i'm in the unfortunate position that i don't actually have access to the hw to test it all out -- so i very much appreciate your testing :)
23:21imirkin_: [basically i'd need kepler+ with DP and DP screens in the same place... i have DP screens at work, and no kepler+ DP boards anywhere]
23:21imirkin_: i should probably try to get a K600 or something
23:23imirkin_: hrmph. $25 on ebay.