00:00 mupuf: Lyude: power gating is quite hard
00:00 mupuf: it involves the PMU
00:00 mupuf: and I never managed to get it to do anything
00:02 Lyude: mupuf: I knew it couldn't have been as easy as that patch was. still, "hard" translates to challenge for me :P
00:03 mupuf: hehe
00:03 mupuf: it is a complex beast...
00:03 mupuf: and honestly, I spent most of my time trying to figure it out
00:03 mupuf: but clock gating is hard to follow
00:03 mupuf: at the very least, we should reproduce what we know is associated with clock gating
00:05 Lyude: mupuf: so just lots of load testing to make sure we don't brownout, and making sure power is actually getting saved correct?
00:05 mupuf: well, testing will come from users
00:05 mupuf: Recommendations
00:05 mupuf: Recommendations: Run heavy benchmarks for a couple of hours
00:05 mupuf: if you get no crashes, it is good enough ;)
00:06 mupuf: same recommendations as overclockers, really
00:06 Lyude: sounds good to me
00:06 mupuf: but we cannot have it on by default
00:06 mupuf: I wonder if tainting the kernel would be a little too extreme or not
00:07 mupuf: but we should add in the kernel logs or mesa's logs when clock gating and power gating is enabled
00:07 Lyude: to be fair I can think of worse things the kernel doesn't taint on that I wish it did
00:07 mupuf: or reclocking
00:07 mupuf: now, I should probably be in bed already. Good night guys!
00:07 Lyude: alright. I'm guessing the eventual plan is to turn it on by default once we actually all the parts implemented and know that it doesn't break?
00:07 Lyude: oh, good night!
00:08 Lyude: *actually have all
00:08 mupuf: Lyude: let's talk about that when we get a hundred users who tested enabling clock gating, while being at the high perflvl
00:09 mupuf: when we get to this point, we'll be quite good!
00:09 Lyude: sgtm
00:09 mupuf: clock gating should not introduce as big an instability as improper reclocking does
00:09 mupuf: brownouts are a reality though
00:10 mupuf: but hopefully, the voltage guard band is safe enough
00:10 mupuf: we could create a program that would run benchmarks and report results
00:10 mupuf: so we can keep track of the stability
00:10 mupuf: but that may be a little extreme :D
00:11 mupuf: let's try our best first
00:11 mupuf: and we'll see
00:30 PyroSamurai: server client conked out, seems I missed rubdos
00:30 PyroSamurai: hi mwk
00:31 PyroSamurai: btw we have huge list of people on the chat but only see the regulars talking, is that the norm?
00:35 Teklad: PyroSamurai: It is.
00:35 Teklad: I drop in from time to time to be a slave driver.
00:35 PyroSamurai: Doesn't make much sense. I mean you don't really have to lurk on a channel that is publicly logged to make sure you don't miss anything.
00:36 Teklad: PyroSamurai: Reading through chat logs is informative.
00:36 Teklad: I get to keep up with nouveau progress on Pascal.
00:37 PyroSamurai: Teklad: Indeed it is informative, but I mostly referencing those who never talk. They can simple read the public backlog after all.
00:37 PyroSamurai: I am*
00:38 Teklad: PyroSamurai: I haven't figured that out either... There's thousands of users in your average #xx linux distro channel.
00:38 Teklad: Yet maybe 30-50 of them are active.
00:39 PyroSamurai: Teklad: that I get though since most channels don't have public logs like nouveau and the freedesktop channels
00:41 Teklad: PyroSamurai: But why rot your life away reading logs all day? Lol... that's a TON of log to read.
00:45 PyroSamurai: Teklad: lots of good info
00:45 Teklad: I'd get fat reading all that info.
00:47 PyroSamurai: I think most young people are good at scanning internet info for relevant data. I don't read everything just the technical stuff which helps me understand the process more
00:50 PyroSamurai: but yeah HI LURKERS :D
00:55 PyroSamurai: btw totally gonna setup a site for RE info, both software and hardware, because I have a hard time finding it myself and it is important so yeah.
01:45 Horizon_Brave: I really hate CAPCHA's at certain times....
01:46 Horizon_Brave: and by certain times I mean 99% of the time
04:03 mooch2: how did i do, s-senpai? https://github.com/envytools/qemu/commit/148c925a54f699c5bc34b4f3ac65bd3353afe3c4
08:04 rubdos: PyroSamurai: yeh, you missed me :')
08:04 rubdos: PyroSamurai: I'm not sure whether it was nouveau failing on me. I had kind of the same symptoms under proprietary now.
08:04 rubdos: Offer still stands though; I can get my hands on several of these T61p's, probably cheaply.
08:05 rubdos: So if you guys want one, for RE or other debugging, let me know.
08:13 dboyan_: rubdos: Is the bug you are facing a GL-related one? If so, you can try to make an apitrace and see if it is reproducible under nouveau.
08:15 karolherbst: mupuf: the effect while running games is kind of low though, there is one, but it's around 2%
08:15 karolherbst: maybe even 3%
08:16 karolherbst: mupuf: I have this one bit enabled for nearly an entire year now :p
08:17 karolherbst: can't say it doesn't affect a desktop GPU though
08:17 mupuf: I once managed to get more like 20% when running xonotic and enabling the auto downclock on idle (using the FSRM) and I had no performance drop at all.
08:17 mupuf: yep, we just need moar data ;)
08:17 karolherbst: well
08:17 karolherbst: auto downclock :p
08:17 karolherbst: sure
08:17 karolherbst: if I cut the clock through a fsrm by 1/8 I also get a power consumption drop of 50%
08:18 karolherbst: but I was speaking about same clock, same voltage just with clock gating enabled/disabled
08:19 karolherbst: mupuf: if you want, I could test the affect on the min voltage requiernments with clock gating enabled on my GPU
08:19 karolherbst: *effect
08:20 mupuf: oh, that is a sweet idea!
08:20 mupuf: downclocking (without voltage change) is a poor-man's clock gating
08:21 mupuf: the FSRM does not change the voltage
08:22 karolherbst: I know
08:22 mupuf: but I guess we'll need to use envdump's metric dumper to know for sure
08:23 karolherbst: I simply enable clock gating on my GPU whenever I load nouveau through nvapoke and also set runpm=0, and I didn't notice any stability problems so far
08:23 mupuf: but do this work, I need to add another fan management technique for the gpu: Keep the temperature constant
08:23 karolherbst: mhhh
08:23 karolherbst: doesn't matter on my system
08:23 karolherbst: you now, EC controlled fan
08:23 mupuf: and that will require a calibration phase = knowing how much power each speed requires
08:24 mupuf: karolherbst: sure, but the temperature will change and that affects power usage
08:24 mupuf: so you get a double effect
08:24 karolherbst: true
08:24 mupuf: and Iwant to know, for a constant temperature, what is the power usage
08:24 karolherbst: hard to tell
08:24 mupuf: and not get the double improvement that a lower power consumption yields
08:24 mupuf: not so hard though
08:24 mupuf: ok, have to go
08:24 mupuf: see you guys!
08:25 karolherbst: well, you can only get a range :p
08:25 karolherbst: power consumption is really volatile depending on the load
08:25 karolherbst: except we disbaled like _all_ the power saving features
08:25 karolherbst: *disable
08:25 karolherbst: no idea how to do that
09:12 hakzsam: dboyan_: imirkin_: series pushed
09:12 hakzsam: dboyan_: thanks for your work
09:24 rubdos: dboyan_: I'm not sure at all. I think something fishy is going on with memory usage.
09:24 rubdos: I'll check later; at work now.
10:24 dboyan_: hakzsam: Thanks
10:24 hakzsam: np
10:25 hakzsam: dboyan_: what's your next task? :)
10:27 hakzsam: dboyan_: one interesting thing to do is to run a full piglit against pascal and compare with maxwell
10:27 hakzsam: not sure if someone already did that
10:27 hakzsam: (compare with maxwell or kepler)
10:53 mupuf: hakzsam: there is a maxwell and a pascal in reator
10:53 mupuf: the kenrel needs to be updated though
10:56 hakzsam: cool
11:30 dboyan_: hakzsam: I guess I want to take my gsoc project into account for the following days :)
11:31 dboyan_: I might want to do some other stuff if I get more time though
11:32 hakzsam: good plan
11:54 technohacker: hey devs, just wanted to know whether nouveau is supposed to manage HDMI audio output through the nvidia GPU
11:54 technohacker: lspci shows that it is managed by snd_hda_intel
13:58 robclark: imirkin_, btw, random question.. how much does codegen care about CFG structure when going (for example) from tgsi -> codegen? Does it care at all about if/loop/etc or just about list of basic blocks and successors/predecessors?
14:04 pmoreau: robclark: IIRC, it is sensitive to the type of edges used (tree, back, forward), but there is nothing special to ifs nor loops (besides the join instruction, but I am not sure whether the CFG cares much about it).
14:04 robclark: ok, so you care about convergence points.. I guess that basically amounts to having >1 successor?
14:05 robclark: err, predecessor
14:06 pmoreau: I would say so
14:06 robclark: ok
14:31 imirkin_: robclark: unfortunately that bit of it is the definition of fragile
14:31 imirkin_: robclark: you look up 'fragile' in the dictionary, it's defined as 'nouveau cfg'
14:32 imirkin_: robclark: aside from various annoyances, the idea is that the CFG edges are annotated based on a MST notion
14:32 imirkin_: HOWEVER
14:32 imirkin_: the tgsi -> nv50 ir converter assigns edge types directly, and gets it wrong
14:32 imirkin_: it does have the advantage of working, but i've been quite fearful of trying to "true it up"
14:34 imirkin_: basically it's the sort of thing that if i break, i'm not sure i'll be able to fix
14:34 robclark: imirkin_, hmm, I was a bit wondering what would happen w/ cl.. consuming spirv (without whatever hints that gfx spirv has about cfg) or llvm (which seems to have no cfg info)..
14:34 imirkin_: since i only pretend to understand RA & such
14:35 imirkin_: i play a compiler writer on TV :)
14:35 robclark: :-P
14:35 robclark: that sounds like a fun tv show :-P
14:35 imirkin_: anyways, i mean it needs CFG in that block A goes to block B
14:36 imirkin_: but there's no distinction between, say, an if/else and a loop
14:36 imirkin_: however there are some heuristics at the end
14:36 imirkin_: which see e.g. an if/else that's dependent on a predicate, and it will flip them into predicated things rather than having branches
14:37 imirkin_: but it just uses the block structure for that
14:37 imirkin_: this will, i suspect, fall FLAT on its face in sight of "more complex control flow"
14:37 imirkin_: esp since it likes to know when there's going to be divergence, and when the divergence ends
14:37 imirkin_: the way that's thrown in right now is manually up-front by the tgsi -> nv50 ir converter
14:38 imirkin_: so if there's no divergence info given by whatever input ir, that won't be great.
14:38 imirkin_: (look for OP_JOINAT and OP_JOIN)
14:39 imirkin_: further fun is that nv50 and nvc0 do this differently, but it's sufficiently compatible to have a fixup pass post-ra
14:41 robclark: ok.. I *think* just given a list of blocks, like llvm (and I guess spirv compute would be similar).. you might have to work out the predecessor blocks yourself but that should give enough info to figure out join points..
14:51 imirkin_: yeah, we don't figure it out today, but we could
14:52 imirkin_: it's some graph algorithm
14:52 imirkin_: we have the graph... and we could copy the graph algo :)
15:19 tstellar: robclark: At the API level LLVM blocks know what their predecessors are.
15:20 robclark: ahh, ok.. that simplifies thing
16:24 pq: that is so weird... I boot my laptop (G96), startx (fluxbox), everything is fine. Quit X, start again: somehow it forgets to repaint the root window at start, showing black instead of wallpaper. If anything causes a region to be redrawn, the correct image appears on that region.
16:25 pq: it always forgets to repaint the root window, except on the first start after a (cold) boot.
18:18 karolherbst: meh...
19:01 Lyude: omg no
19:01 Lyude: imirkin_: i was in the middle of trying to write my own multisampling test for that post depth coverage extension
19:01 Lyude: and then i noticed something. went back to master, and tried running that multisampling test using piglit instead of just launching the binary
19:01 Lyude: turns out the test actually works
19:02 Lyude: :|:|
19:05 Lyude: aren't piglit tests supposed to, also just work properly when you run them directly? or have I been misled
19:08 imirkin_: not sure what the distinction is.
19:08 imirkin_: oh, you were running without -fbo perhaps? that can matter.
19:09 imirkin_: or perhaps the person who wrote the test made it work differently?
19:09 imirkin_: and/or in automatic mode?
19:10 Lyude: man i can't believe how much time i wasted on this aaa
19:10 Lyude: and imirkin_ yeah I didn't have -fbo, didn't realize that was a thign I needed
19:10 karolherbst: ask early and often :p
19:10 Lyude: honestly it didn't even cross my mind
19:11 imirkin_: but you learned something :)
19:11 Lyude: i learned a good bit!
19:17 karolherbst: mupuf: mind replacing the pascal with a maxwell2?
19:18 karolherbst: it's now fun if the nouveau module crashes for silly reasons :/
19:29 Lyude: hrm, GM108 should be able to support extensions like ARB_post_depth_coverage shouldn't it?
19:30 Lyude: oh no, gm200 class
19:30 imirkin_: afaik no
19:30 Lyude: right
19:50 Lyude: hold on, I should be able to get mesa working on pascal shouldn't I? or is there something we don't have yet to make that work
19:50 Lyude: hm, actually I wonder if maybe the linux-firmware package on here is just out of date
19:52 Lyude: skeggsb: do we have the firmware for pascal in fedora yet?
19:58 imirkin_: it's in linux-firmware. can't comment on distros.
20:02 imirkin_: you need kernel 4.12 (/drm-next) to use it though
20:03 Lyude: ahhhh, that explains things
20:03 Lyude: thanks for the tip
20:37 karolherbst: uhhh, silly pascal card :/
20:38 Lyude: actually, does anyone have anything with pascal or maxwell2 in it?
20:38 Lyude: that I can ssh into
20:38 karolherbst: Lyude: did mupuf set your key up already?
20:38 Lyude: i have just killed my pascal machine at home and dxon't have any way of rebooting it :(
20:38 Lyude: karolherbst: nope
20:38 karolherbst: well, then no :p ask mupuf very nicely then
20:39 Lyude: mupuf: can I have access to your nvidia machines pretty please?
20:40 karolherbst: uhm.. why I am so silly and do silly things
20:42 karolherbst: note to myself: do things right from the start
20:42 jamm: hakzsam: regarding line 25 in https://pastebin.com/QKKGuKfL, how is ipa using $r2:$r3 together? Actually, I'm kinda stumped at understanding the syntax of ipa.. i'm looking at maxas gm107.c, still trying to make sense of the grammar there. (Sorry for the late response, been travelling a lot lately)
20:42 Pie_Mage: you should make a time machine and go back in time and secretly stick that note to your monitor
20:43 hakzsam: jamm: the envydis output just follows nvdisasm's syntax which is... unclear
20:43 hakzsam: jamm: $r2 is 64-bit
20:44 jamm: i do know that ipa is one of LINTERP/PINTERP ops, but which one's the source/dest is kinda unclear to me, hmm
20:44 hakzsam: so you have to read it as $r2:$r3
20:44 hakzsam: $r2:$r3 is the dest
20:46 jamm: hakzsam: ah, i see.. so the GPR's are 32bit but the $r2 is being read as $r2:$r3 so as to refer to a 64bit chunk?
20:46 hakzsam: so, it reads a 64-bit value at offset 0x90 in addr space a[], $r0 is some sort of indirection I would say
20:46 hakzsam: right
20:47 hakzsam: ipa $r2:$r3 a[0x90] $r0 0x0 0x1 --> this would be easier to read
20:48 hakzsam: it's displayed like this for pre-maxwell ISA though
20:52 imirkin_: uhm
20:52 imirkin_: that's wrong
20:52 jamm: hakzsam: i see.. so ipa reads a 64bit value from the src and writes it into a 64bit dst, in this case $r2:$r3 (Written as $r2 here)
20:52 imirkin_: ipa does not do multiple components at a time
20:52 jamm: oh
20:52 imirkin_: tex, on the other hand, consumes multiple components
20:52 imirkin_: tex nodep $r1 $r2 0x0 0x1 t2d 0x8
20:52 imirkin_: is really
20:52 imirkin_: tex nodep $r1 $r2:$r3 0x0 0x1 t2d 0x8
20:53 imirkin_: does that make sense?
20:53 hakzsam: ah my bad, I misred
20:53 hakzsam: *misread
20:54 hakzsam: "tex nodep $r1 $r2 0x0 0x1 t2d 0x8
20:54 hakzsam: $r2 is actually $r2:r$3 (ie. a 64-bit addr). Make sure to wait for the two ipa bars."
20:54 imirkin_: well, it's not a 64-bit addr
20:54 imirkin_: $r2 is the x coord, and $r3 is the y coord
20:54 imirkin_: it's a 2d texture sampling op
20:54 hakzsam: ok
20:55 jamm: ah, my bad too! i misread as well
20:55 jamm: it's tex and not ipa
20:56 jamm: imirkin_: by multiple components, you mean any consecutive set of registers?
20:56 imirkin_: right...
20:56 imirkin_: so ...
20:56 imirkin_: IPA = InterPolate A[] (my personal guess)
20:56 imirkin_: A[] is the shader input/output space
20:57 imirkin_: some ops, like LDC can come in LDC.64 and LDC.128 varieties (actually iirc .128 doesn't work)
20:57 imirkin_: which will read 64 bits worth of data and store it to sequential registers
20:57 imirkin_: IPA has no such variants
20:58 jamm: oh wow, now it makes sense
20:58 jamm: thanks a lot :D
20:58 imirkin_: each register ultimately holds a 32-bit quantity
20:58 imirkin_: on nv50 there's functionality to address half-registers as well
20:59 imirkin_: and on GM20B and pascal+, i understand that there's a way to perform fp16 math. i don't know how that's encoded however.
20:59 jamm: right, so these GPR's are 32bit but they can be used in sequence to refer to bigger values for ops' that can handle them
20:59 imirkin_: right - well the op does whatever it wants ultimately
20:59 imirkin_: you don't pass a register to it, you pass it a register id
21:00 imirkin_: and it does with that id whatever it pleases. normally that's reading from the relevant register
21:00 imirkin_: but under certain circumstances, it can also read sequential registers
21:01 hakzsam: jamm: except this, how about the other comments? makes sense?
21:02 imirkin_: man, nouveau really sucks at dota2
21:03 jamm: hakzsam: about the rd -> wr comment, i put that because i assumed $r0 is being read from or something
21:04 jamm: imirkin_: hmm, maybe it's the source engine? gotta check if cs:go or day of infamy also sucks with nouveau XD
21:04 jamm: i'll be trying em out this weekend
21:04 hakzsam: jamm: yeah, but you already for $r0 in mufu rcp $r0 $r0
21:04 hakzsam: +wait
21:04 jamm: hakzsam: right
21:05 jamm: ah, i was looking the control codes wrong
21:05 hakzsam: wr is for dst registers, rd for src registers
21:06 jamm: okay, so these control codes, they set the read/write dep bars (for the consecutive 3 instructions) all at once via sched, right?
21:06 jamm: i was looking at them procedurally, one by one
21:09 jamm: hakzsam: thanks! i'll send another paste here with the changes, then begin with whatever i've learned on the other shaders :D
21:09 jamm: after i get your LGTM that is
21:11 jamm: fwiw, i'm currently on a pc with 980Ti which i'll have access to till next weekend after which i travel back to work
21:11 jamm: at home i have a 1080 but it should work fine as well, i guess
21:15 hakzsam: jamm: I don't understand your question, but basically:
21:16 jamm: hakzsam: nvm, it was just me typing to myself (gotta keep it afk next time XD)
21:16 hakzsam: for loads, like 'ld $r0 g[0x0]', ld has a variable latency, so you need to emit a write dep bar (wr) to prevent RaW hazards
21:16 jamm: right
21:16 hakzsam: for stores, like 'st g[0x0] $r0', you need to emit a read dep bar (rd) to prevent RaW hazards (in case $r0 is used *after* of course)
21:17 imirkin_: jamm: dota2 appears to be extra-suck
21:17 hakzsam: jamm: there are many corner cases, but the basic idea is simple :)
21:18 hakzsam: 44 vs 175, nice perf
21:18 jamm: hakzsam: makes sense.. i'm also referring to your commits on mesa
21:18 karolherbst: "nice" perf
21:18 hakzsam: :)
21:19 karolherbst: well compared to stock clocks, it's indeed nice
21:19 hakzsam: yeah, 10fps is a joke
21:19 karolherbst: those 44 fps are a joke as well though
21:19 hakzsam: yes
21:19 karolherbst: but maybe dboyan_ helps out here :p
21:20 jamm: console gamers would be happy with >= 30fps :P
21:21 RSpliet: yeah but consoles have AMD
21:21 karolherbst: hakzsam: funny that we are better on maxwell....
21:22 karolherbst: but yeah, 25% vs 29%....
21:22 karolherbst: _big_ difference
21:31 jamm: hakzsam: for waiting on wr 0x0 and wr 0x1, is wt 0x3 appropriate?
21:31 hakzsam: yes
21:31 imirkin_: http://rion.io/2017/02/09/why-wont-you-answer-my-question/
21:32 imirkin_: should be an auto-response in our bugtracker...
21:33 jamm: hakzsam: great! i asked coz i got a bit confused by the usage of wt 0x3 here https://cgit.freedesktop.org/~hakzsam/mesa/commit/?h=gm107_scheduler&id=6a4503026525246b9330da2b08e2caa71a963a5b
21:33 jamm: see "sched (st 0x6 wr 0x0 wt 0x3) (st 0xd wt 0x1) (st 0x1)"
21:34 jamm: here only 0x0 is set, but wt 0x3 is present as well
21:34 jamm: could be one of those corner cases, i guess
21:35 hakzsam: wt happens *before* the instruction
21:36 hakzsam: in this case, we are waiting for $r0 and $r3
21:36 hakzsam: err, $r0 and $r1
21:39 hakzsam: dboyan_: jamm, btw, one other interesting task is to replace imul/imad by xmad on maxwell+
21:40 hakzsam: the main advantage is that xmad doesn't require any dep bars
21:40 hakzsam: and I have never seen blob using imul/imad on maxwell
21:40 hakzsam: always xmad, but the hard part is to figure out the different flags :)
21:41 imirkin_: the main advantage is that xmad is a lot faster too
21:41 hakzsam: of course
21:41 hakzsam: it needs 6 cycles and no dep bars
21:41 hakzsam: way faster
21:41 imirkin_: but needs figuring out of how it works
21:41 hakzsam: right.
21:41 imirkin_: at least enough to replace imul/imad
21:41 karolherbst: and what problems does xmad have?
21:41 imirkin_: we don't know how it works :)
21:42 hakzsam: and the flags are tricky :)
21:42 jamm: interesting
21:43 hakzsam: one approach is to write a very simple GLSL test with piglit, record a MMT with valgrind-mmt, extract the shaders and think :)
21:44 hakzsam: jamm: does it make sense to use wt 0x3 there?
21:45 jamm: i could definitely help with that.. sounds like a good opportunity for further improvements. Hopefully this sched code work would help me understand the existing instructions better :)
21:45 jamm: hakzsam: i understood that $r0 and $r1 are being waited on, but where are their respective bars being set?
21:45 hakzsam: ok so
21:45 hakzsam: $r0 comes from imul u32 u32 hi $r0 $r0 $r2
21:46 jamm: right
21:46 hakzsam: and it's emitting a wr 0x0, so wt 0x1
21:46 hakzsam: $r1 comes from i2i u32 u32 $r2 neg $r1
21:46 hakzsam: rd 0x1, so wt 0x3
21:46 Lyude: oh, okay, imirkin_ so it looks like that I didn't actually completely waste my time here. I just noticed this multisampling test only breaks if I comment out the lines in the fragment shader that enable arb_post_depth_coverage… but if I break things from the actual GL driver it, doesn't break?
21:48 imirkin_: probably becuase you're disabling early fragment tests?
21:48 Lyude: i was just about to say i thought that was it
21:48 Lyude: that is a relief :)
21:50 hakzsam: jamm: you might want to say: but why do we use a read dep bar there because imad also reads $r1?
21:50 mupuf: Lyude: ok, giving you access now
21:50 jamm: hakzsam: oh, these are instructions above the sched containing wt 0x3
21:50 mupuf: for some reason though, adding your name to wtrpm has been yielding surprising results
21:51 mupuf: AKA: segfault :o
21:51 karolherbst: ...
21:51 hakzsam: jamm: sure. as I said 'wt' is before
21:51 karolherbst: too short?
21:51 hakzsam: mupuf: lol
21:51 Lyude: mupuf: neat
21:51 imirkin_: Lyude: i'd still recommend writing your own test
21:51 jamm: hakzsam: ah, i see now
21:51 imirkin_: i don't think that was a well-written test
21:51 karolherbst: or too many names
21:51 Lyude: good thing I didn't throw out the test I started!
21:52 imirkin_: tbh i'm not entirely sure what that test does
21:52 imirkin_: or how it works
21:52 jamm: hakzsam: so the wt waits on rd/wr's set on the sched before it
21:53 hakzsam: jamm: so, I use a read dep bar because imad writes *into* $r1 and maybe imad can be done before i2i (not the same unit)
21:53 imirkin_: perhaps it's a perfectly fine test. i don't know :) in yours, try to add lots of comments about how it's meant to work.
21:53 Lyude: yeah i am mostly sure it's a bogus test
21:53 hakzsam: jamm: yes, wt applies for sources only
21:55 hakzsam: jamm: https://hastebin.com/faraxivagu.pl
21:55 hakzsam: maybe this will help
21:55 hakzsam: it's not totally correct, just a different way to think of
21:57 jamm: hakzsam: understood, so this explains for that particular unit
21:59 jamm: and as you explained above, the wt 0x3 comes from the bars applied in the previous unit
22:00 hakzsam: why unit? it's not appropriate
22:00 jamm: err, by unit i mean, instruction triplets
22:00 jamm: sorry, not sure what the correct term is
22:01 hakzsam: I would say block of 3 instructions, but I never use that
22:01 jamm: block, yeah that sounds better to me
22:02 jamm: so the wt 0x3 basically waits on the two registers $r0 and $r1 being used by imul and i2i respectively
22:02 jamm: from the previous block
22:02 hakzsam: not sure if it's clear, but dep bars are not set per block (ie. you can emit a wr dep at instruction 0 and wait at instruction 1545)
22:02 hakzsam: that's correct
22:02 jamm: oh!
22:02 jamm: so they're global
22:03 hakzsam: they are
22:03 jamm: i mean, global wrt. to that particular asm
22:03 hakzsam: global to a program
22:03 jamm: right
22:03 hakzsam: but nouveau codegen doesn't support that for some reasons
22:03 jamm: thanks for correcting me, i'll have to get used to some new jargon now ^^
22:04 hakzsam: just noticed, but sched codes on maxwell can be improved in many ways
22:04 hakzsam: like dual-issue
22:05 jamm: you mean the ones on nouveau or in general?
22:06 hakzsam: the code emitter
22:06 hakzsam: in nouveau codegen
22:06 jamm: ah, right
22:06 hakzsam: few things I have in mind: dual-issue, use getReadLatency() and understand the yield flag
22:07 hakzsam: you might be able to get, let's say +10-20% of perf
22:08 hakzsam: (maybe more with shaders bound applications like piano, furmark etc)
22:08 hakzsam: but not +200% :p
22:12 karolherbst: oh no, now I have to fight with Lyude over reator :(
22:14 jamm: hakzsam: yeah, sounds interesting.. doesn't feel like it'd give as much of a boost as replacing imul/imad, i guess
22:14 jamm: but definitely worthwhile improvements
22:15 hakzsam: well, integer multiplications are not really much used
22:15 hakzsam: the yield flag is probably the best, but it's hard
22:16 hakzsam: getReadLatency() is funny though, and should be simple :)
22:17 hakzsam: and dual-issue will become useful with dboyan_'s gsoc
22:17 jamm: cool! glad to see a gsoc project here
22:17 jamm: i did one at wine myself, but that was way high up at API level
22:18 hakzsam: nice
22:18 jamm: i'm trying to discover lower levels now, see how it suits me
22:18 jamm: so far, not tired, but time consuming
22:18 jamm: getting there though, slowly
22:19 karolherbst: there is only one way: work on the kernel :p
22:19 jamm: yeah :d
22:19 jamm: for me, anything graphics is really interesting
22:19 karolherbst: have no fear, the kernel won't bite, only mess up your file systems
22:20 jamm: whether it's high up in shader programs or low down here in asm
22:20 karolherbst: well, asm is on its way out
22:20 karolherbst: you won't do any asm in the kernel
22:20 karolherbst: ohh
22:20 karolherbst: well, no, you won't
22:20 jamm: well, that'd be nice i guess ;D
22:22 jamm: ah, but nothing in kernel interests me atm, except dri/drm probably
22:23 hakzsam: jamm: how may shaders need to be translated in the DDX?
22:24 jamm: hakzsam: 8 i believe, *110*.fp/vp
22:24 jamm: many are similar
22:24 jamm: two of em are different
22:24 hakzsam: ok
22:27 karolherbst: huh, duh, who clears out the scratch register :/
22:28 jamm: hakzsam: is this ok? https://hastebin.com/juturajelo.bash
22:30 hakzsam: jamm: nope, wt 0x0 == no waits
22:30 hakzsam: it's a bitfield
22:30 hakzsam: if wr 0x0, you need wt 0x1
22:30 hakzsam: (I know it's confusing)
22:30 jamm: ah
22:30 jamm: yeah
22:30 jamm: bitfields
22:31 hakzsam: this is because wt 0x0 is "reserved", it's a no-op
22:31 jamm: wt always starts from 0x1, then 0x3,0x7.. to 0x3f
22:31 jamm: okay
22:31 hakzsam: yeah
22:33 hakzsam: jamm: fmul doesn't need any dep bars
22:34 jamm: hakzsam; yeah, just realized the fixed latency ops don't require any bars, am i right?
22:34 jamm: iirc, it was mentioned in some of your comments
22:34 jamm: i'll have to re-read them to understand better
22:35 jamm: as well as the control codes article on maxas
22:35 hakzsam: well, instructions which have a fixed latency don't need any bars yes
22:35 hakzsam: after a quick look, except wt 0x0 and fmul it looks good
22:36 jamm: thanks a lot! i know many of the questions i asked were redundant, but your help really kicked things up a notch for me :D cheers
22:36 hakzsam: but I will need to think more on the morning to make sure there are no corner cases :)
22:36 hakzsam: no worries
22:36 hakzsam: feel free to send me an update version
22:36 hakzsam: *updated
22:37 jamm: hakzsam: updated https://hastebin.com/giwiyipafa.bash
22:37 jamm: i'll work on the other shaders with similar structure in the mean time
22:37 jamm:goes to sleep, cya guys!
22:38 hakzsam: okay, see you
22:38 hakzsam: I will send you comments tomorrow
23:40 phoenixz: Hi there, I was here a month or so ago with problems on linux mint 18 with three nvidia video cards.. Ended up using only two with two monitors because anything else simply didn't work. I've decided to buy a new video card with multiple outputs, so that I can connect 3 monitors on one card using HDMI (I don't want to use converter cables from DVI and that sort of stuff). I have budget of about 200 - 300 USD, but I'm located in Mexico.. Could
23:40 phoenixz: anybody help me with recommending a video card?
23:40 phoenixz: As in, I don't want to buy yet another video card (I'll have to sell these three for near nothing now anyway) and again get stuck with a system that doesn't work..
23:47 gnarface: phoenixz: my recommendation is to drop your hard-line stance against adapters. you should be able to get a cheap adapter to convert HDMI to DVI or DVI to HDMI without any quality loss
23:48 phoenixz: gnarface: Well its not a hard line stance.. I guess its more a "its 2017 why do I still need adapters??" kind of stand, I'd like to avoid it if possible.. If not possible, then adapters it is...
23:49 phoenixz: Are there video cards which can control 3 monitors without problem?
23:49 gnarface: phoenixz: well, it should not be necessary to replace the entire cables, and since HDMI and DVI are both digital, the converters won't cause echoes or anything like that you'd expect with older analog connections
23:50 gnarface: phoenixz: there ARE video cards that can control 3 monitors without a problem, but i can't tell you if any of them currently do or ever will even work with nouveau
23:50 phoenixz: gnarface: yeah, I know, just that ... I dunno, I guess this thing in my head :)
23:50 phoenixz: gnarface: would they work with (ugh) nividia binary?
23:50 phoenixz: Because so far, nvidia binary basically is a russian roulette fiesta on my computer...
23:51 gnarface: phoenixz: yea, they should with the proprietary drivers, though xorg setup may not be super straightforward and #nvidia is likely to be hostile and useless
23:52 gnarface: phoenixz: the adapters though... i've got these things just laying around. i've gotten a handful over the years for free
23:52 gnarface: phoenixz: certainly it can't be that hard to get your hands on used ones?
23:52 gnarface: seems to me to be the cheapest, sanest, least-intrusive option anyway
23:53 phoenixz: gnarface: I already got me one adapter cable (haven't found "just" adapters here) but they were like 20USD.. Not hugely expensive, but still an anoyance I guess..
23:53 phoenixz: gnarface: Any cards you might recommend on a personal level?
23:55 gnarface: phoenixz: my gf just got one of those new GTX 1060 Ti cards, 6GB edition from ASUS. it's real nice and it has like 8 HDMI ports plus a DVI port. also ~300$ if you find one on sale. i'm almost certain it does not work with nouveau though currently, and if that's true i wouldn't hold my breath for it changing, ever
23:55 gnarface: phoenixz: maybe check ebay for cheap adapters?
23:56 phoenixz: gnarface: Aren't you a nouveau dev? I mean, if my gf ever got a new video card and the driver wouldn't support it.. ;)
23:56 phoenixz: I will
23:56 phoenixz: Well thanks very much, I'll take a look for that GTX 1060 card
23:56 phoenixz: Other problem is that I'm in mexico, it can be a bit hard to get hardware here sometimes :D
23:57 gnarface: phoenixz: heh, sorry no i'm no dev. i just hang out here to watch the progress. i have some older nvidia cards i hope will one day be supported by nouveau fully since the official drivers no longer do and the legacy driver is too old to work with Steam
23:58 phoenixz: Someday hell will freeze over and nvidia will help out with great open source drivers.. oh well, one day..