01:15 benh: hrm
01:15 benh: $ git grep dma_mask drivers/gpu/drm/nouveau/
01:16 benh: $
01:16 benh: we don't set the dma mask in nouveau ? we are limited to 32-bit ?
01:16 imirkin: mmmm... skeggsb? airlied? --^
01:18 benh: argh, have to run, ttyl
02:41 jneto: I've just render my system more stable.
02:42 jneto: I told here that sometimes my screen freezes when I open some videos on mpv.
02:43 imirkin: and i'm fairly sure that i said that mpv's usage of vdpau + opengl = megafail on nouveau
02:43 jneto: yep
02:43 imirkin: you can either not use vdpau for hwaccel, or you can use vdpau for video output
02:44 imirkin: or not use mpv
02:44 orbea: my solution is to use mplayer for hwdec and mpv for everything else with software decoding
02:44 jneto: I change the video output driver to xv.
02:44 jneto: That was the significant change.
02:44 imirkin: that disables vdpau, i believe
02:44 imirkin: (and opengl)
02:45 imirkin: if you want to get the benefit of vdpau, use vdpau for hwdec *and* output
02:45 imirkin: everything else won't do what you want
02:45 orbea: opengl-hq or opengl both work for vo with nouveau as long as you dont have hwdec=vdpau
02:46 imirkin: sure
02:46 imirkin: an additional benefit of mplayer is that it successfully plays all videos, while pretty much all other players only play a fraction of videos
02:47 imirkin: [admittedly a high fraction, but not 100%]
02:47 orbea: its kind of conflicting for me, because everything mplayer does right mpv does wrong and vica versa...
02:47 orbea: hwdec is definately a big plus for mplayer
02:48 jneto: I haven't luck with mplayer. The first video I tried simply didn't open.
02:48 jneto: "Unexpected decoder output format Planar 420P 10-bit little-endian"
02:48 imirkin: ah, the issue was vdpau actually
02:48 imirkin: at least chances are
03:10 jneto: hwdec=vdpau + vo=vdpau is unstable.
03:10 jneto: https://paste.fedoraproject.org/425087/34768691/
03:12 imirkin: huh. odd.
03:12 imirkin: maybe it still multithreads it
03:12 imirkin: which would still cause it to break, same as vdpau + gl
03:13 dbacc: hey I'm using reverse Prime to connect to my monitor on DP which is hardwired to my nvidia GPU (Lenovo T420). xrandr nicely shows all the ports that are available. However all the ports are marked as "disconnected". Is this a bug of nouveau?
03:14 imirkin: dbacc: maybe. perhaps you're misdiagnosing. mind posting 'xrandr -q' output?
03:14 imirkin: (pastebin)
03:16 dbacc: imirkin: http://pastebin.com/TT5yknDL
03:17 imirkin: looks like you are not misdiagnosing
03:17 imirkin: is the claim that you indeed have a screen connected?
03:17 imirkin: what kernel? anything interesting in dmesg when you plug the screen in?
03:19 dbacc: imirkin: that's the claim, exactly. and it's working on windows, so I assume, basically it should work somehow. kernel version is 4.7.2
03:19 dbacc: dmesg shows no output when plugging in.
03:20 imirkin: is the screen in fact a DP screen connected via a DP cable? what resolution is it?
03:20 imirkin: try booting with nouveau.debug=debug,bios=trace drm.debug=0x1e
03:20 imirkin: which should spew out a lot more info
03:21 imirkin: recently Tom^ reported an issue where we were training the link at too high a rate and for some reason the link training succeeded
03:21 imirkin: but in reality it failed
03:22 dbacc: the screen has DP connection it's connected with a new DP cable, yes. Resolution should be FHD. alright. Going to reboot. BRB
03:22 imirkin: if you want to play with kernel stuff
03:23 imirkin: you can also try editing the list at https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/engine/disp/dport.c#L308
03:23 imirkin: and remove the 540,1; 1080,2; 2160,4 entries
03:23 imirkin: which are the 540mhz-based freqs
03:24 imirkin: actually i guess that's more like advice for Tom^ given his issue...
03:28 imirkin: so Tom^, consider yourself advised.
03:33 dbacc: imirkin: is there anything particular I should look for in the logs?
03:34 imirkin: things that happen when you plug the cable in.
03:35 imirkin: [the full debug log starting with a cable re-plug would be great]
03:36 imirkin:hopes at least something in there
03:38 dbacc: hmm... it seems like nothing gets triggered when I plug in the cable
03:40 imirkin: but a ton of stuff gets printed on boot i presume?
03:41 dbacc: yes, exactly, not only on boot. there are tons of (repeating?) messages every minute. But don't look like obvious (!) errors
03:42 imirkin: mind pasting the whole thing? perhaps i'll notice something
03:43 imirkin: although from the sound of it, something's totally fubar wrt DP =/
03:44 imirkin: we could be missing some ACPI call to "hook up" the ports, but i've never heard of anything like that being necessary...
03:45 dbacc: wait, was the comma in nouveau.debug=debug,bios=trace drm.debug=0x1e on purpose?
03:45 imirkin: yes
03:46 imirkin: although if you just had nouveau.debug=debug that should still have printed *something*
03:46 imirkin: the bios trace was going to be extra icing on the proverbial cake
03:53 dbacc: haha, okay., didn't know you could nest these commands... hmm the log is 10M. wait a second
03:54 imirkin: ew
03:54 imirkin: oh, i guess i915 is going nuts =/
03:56 dbacc: https://dl.dropboxusercontent.com/u/4425881/nouveau.txt
03:59 imirkin: wow, the intel atomic stuff is nuts
04:01 imirkin: sorry, i'm not sure what's going on here
04:01 imirkin: skeggsb is the DP expert
04:22 dbacc: imirkin: I got disconnected. Did you write anything else after 55th minute last hour?
04:38 orbea: aren't there logs....somewhere? the topic has a link to logs of #dri-devel
04:57 benh: back
04:57 benh: skeggsb: around ?
06:12 Tom^: imirkin: i feel advised.
10:01 yeehi: QUESTION: How much would it help the nouveau project now and in the future, if Wikileaks disclosed the source code of nvidia's most recent drivers?
10:02 karolherbst: none at all, cause that would cause legal troubles
10:03 karolherbst: then we would need to add a clause like wine: everybody who saw nvidia source code at least once isn't allowed to contribute at all
10:04 loonycyborg: hehe this borders on persecution of thoughtcrime
10:05 yeehi: Could the nouveau project handle it as a clean room situation? https://en.wikipedia.org/wiki/Clean_room_design
10:05 karolherbst: it actually does it already
10:06 karolherbst: we only do clean room reverese engineering
10:06 loonycyborg: anyway, nvidia-drivers and nouveau differ in overall design too
10:06 karolherbst: yeah, but knowing how the hardware works is already enough information
10:07 yeehi: Some of the nouveau developers might be tempted to look at the source, and then define what would be needed to make nouveau development more successful. The other developers could look at this improvements list and make advances for nouveau
10:08 loonycyborg: proper documentation of hardware would be more useful
10:08 loonycyborg: rather than trying to decypher nvidia's obfuscated source
10:08 yeehi: What is the explanation for nouveaus poor performance relative to nvidia's blob? Which part of hardware is not understood well enough?
10:09 loonycyborg: lack of support for autoreclocking :P
10:09 karolherbst: graph + reclocking, but the latter part is pretty good these days for tesla kepler and maxwell
10:11 yeehi: Couldn't the knowledge of hardware needed be derived from the nvidia driver source code?
10:12 karolherbst: right, but then we are back to legal problems
10:13 Tomin: leaking of closed source driver source code would be very counter productive and somewhat dangerous for projects like nouveau and it'd be stupid to do such a thing IMO
10:14 Tomin: besides nvidia is working with nouveau people more these days
10:15 karolherbst: also it removes all the fun
10:33 yeehi: That is a good point, karolherbst
10:35 yeehi: Tomin, when Linus said "We shall have to wait and see." about whether nvidia's espoused desire to be more helpful towards nouveau, that was years ago.
10:35 yeehi: Have they infact become distincly more helpful, or was it all words, a public relations exercise, with no substance to it?
10:37 karolherbst: well they do money with helping us, but they actually pay the devs working on tegra support within nouveau
10:49 yeehi: Do nvidia provide gratis hardware to nouveau devs? Would that be of much assistance?
10:49 yeehi: How much videoconferencing is there between nvidia and nouveau devs?
10:50 yeehi: Could somebody please describe a situation where it might be said that nvidia is being obstructionist?
10:52 karolherbst: yeehi: I already told you everything, it doesn't benefit them, so why should they help at all? It is their money they spend to help us, that's the thing, I am not aware of any situation where they hinder us in doing anything
11:00 karolherbst: yeehi: what is your goal anyway
11:05 karolherbst: now we are getting somewhere :) nvidia: 1364 nouveau: 1135 score
11:05 karolherbst: 83%, not too bad
11:46 karolherbst: 1147 score now :) and 84%
11:53 Tom^: in?
11:55 karolherbst: pixmark piano
12:27 karolherbst: max(abs(a), 0) = a is pretty wrong now that I think about it...
12:28 karolherbst: max(abs(a), 0) = abs(a) should be the right thing
12:57 Tomin: max(abs(a), 0) = a if a > 0, max(abs(a), 0) = 0 if a <= 0, right?
13:08 Tomin: oops, no
13:08 t-ask: it is more like !=0
13:08 Tomin: it's always abs(a), yes
13:08 Tomin: I'm just confusing everyone here
13:10 Tomin: karolherbst: ignore me, you're right :)
15:23 Tom^: karolherbst: things are for sure moving forward, ive gained 10 score and ~10 Max FPS in unigine. :p
15:23 karolherbst: :D
15:24 imirkin: 10 max fps is about what i get in unigine too :p
15:24 Tom^: haha
15:24 imirkin: that's at 640x480 of course
15:25 Tom^: http://i.imgur.com/t6XCIGJ.png im almost over 100 !
15:25 karolherbst: Tom^: did you run twice?
15:25 Tom^: i stepped through the scenes before running it.
15:25 karolherbst: allthough 31 fps min should hint that
15:25 karolherbst: yeah
15:26 karolherbst: Tom^: mind testing my pixmark_piano mesa branch?
15:26 Tom^: sure
15:26 karolherbst: it might worse things, cause compilation time increases by like 30% for the one pixmark_piano shader...
15:27 karolherbst: but there are some experiments in it
15:27 karolherbst: most likely it will also break things
15:27 Tom^: sounds like fun
15:28 karolherbst: increases perf in pixmark_piano by a lot though
15:28 karolherbst: and with a lot I mean more than 10%
15:28 karolherbst: so the max effect in any real application is below 1% :D
15:29 karolherbst: imirkin: running some passes multiple times increases cimpilation time by around 15% per round
15:30 imirkin: optimization ain't free
15:31 karolherbst: yeah I know, I just kind of hoped it wasn't that much
15:31 karolherbst: ohh wait, I think I compiled with O0 and g3
15:32 imirkin: -O0 on c++ is pretty much death
15:33 karolherbst: well shader-db needs around 1.4seconds to compile that pixmark shader :D
15:33 Tom^: -O3 ftw!
15:34 Tom^: not that ive ever seen a difference in any benchmark between -O2 and -O3 but the placebo is great.
15:34 imirkin: Tom^: you should try -O1000
15:35 Tom^: imirkin: if only that worked :(
15:35 imirkin: i hear it's 500x faster than -O2
15:35 Tom^: perhaps it optmizes everything out and main() is simply a return 1;
15:35 Tom^: them speeds.
15:36 imirkin: i remember the random cvs checkout of gcc that was shipped with redhat 6.0 (gcc "2.96") optimized some of my programs by just removing loops in them. that was nice.
15:37 karolherbst: :D
15:45 Tom^: karolherbst: il make a similiar uh benchmark compilation with those gputest benches with upstream mesa and yours. compiling your branch right now
15:56 Tom^: karolherbst: https://gist.github.com/gulafaran/950b06cc5af62336dcc731aaee8170d5 first run is mesa-git and second is your branch. however the JuliaFP32 bench on your run might be wrong since my WM accidently grabbed it and resized it while it ran
15:58 karolherbst: I ran piano by the way
16:00 karolherbst: k, seems okay though, nothing I didn't expect. More interessting would be a heaven run on the branch
16:00 Tom^: thats running atm.
16:00 Tom^: :p
16:00 karolherbst: a little bit surprised that volplosion got such a high impact
16:02 Tom^: unigine increased by 2 in score
16:02 karolherbst: :D
16:02 Tom^: idk if thats in the margin of error tho ;)
16:02 karolherbst: that might be like random noise
16:03 karolherbst: allthough heaven is pretty stable between runs
16:03 karolherbst: sadly the effect in piano was rather slim on your machine
16:04 karolherbst: I expected a bit more
16:05 Tom^: this also assumes i guess that your branch is kinda the same as mesa-git
16:05 karolherbst: kind of, yes
16:05 karolherbst: updated yesterday
16:06 Tom^: and that i actually was on kinda new enough mesa-git
16:06 Tom^: because i didnt rebuild it before running it heh
16:08 Tom^: oh well nothing broke atleast and piano went up
16:08 Tom^: win win.
16:09 karolherbst: well, compilation time increased a lot
16:22 Tom^: karolherbst: cant say i notice that, unless i can bench it somehow :p
16:34 Tom^: karolherbst: i was on this commit anyways https://cgit.freedesktop.org/mesa/mesa/commit/?id=e7a73b7 but gotta go to work now, il simply retest with a fresh mesa-git and add those runs aswell to the csv
17:25 pmoreau: If I want to use envydis on code for a GK104, I should use `envydis -mgf100 -Vgk104`, right?
17:27 karolherbst: pmoreau: yes
17:27 karolherbst: make sure you use -w if you use the output from mesa
17:28 pmoreau: Yup, I remember about the -w. It’s just that I was getting some unknown opcodes returned by envydis.
17:29 pmoreau: Which disappeared when I switched to NVE7 for nouveau-compiler… (as my card is an NVE7 and not NVE4 as I was thinking)
17:30 karolherbst: huh
17:30 karolherbst: nve4 and nve7 have the same isa
17:30 pmoreau: True
17:30 karolherbst: sure you were using nve4 before?
17:31 imirkin: pmoreau: without -V gk104 you miss out on all the images ops as well as some other random ones
17:31 karolherbst: sched stuff as well, right?
17:32 imirkin: mmmm i doubt that's restricted
17:32 imirkin: maybe though. dunno.
17:32 karolherbst: it should be
17:32 karolherbst: mhh, there is an opcode for sched though... maybe it does
17:33 pmoreau: I am puzzled, recompiling using the same command line for NVE4 no longer returns those unknown… I must have confused nouveau-compiler somehow with my code.
17:35 pmoreau: Yup, I must be confusing nouveau-compiler, cause I am not getting the exact same output each time I run it…
17:35 karolherbst: better check your bash history :p
17:36 pmoreau: Why?
18:05 karolherbst: imirkin: maybe we should do something like that? https://github.com/envytools/envytools/pull/60
18:06 imirkin: that's a bigger and more annoying and more unnecessary change
18:06 imirkin: i suspect the reason it was bitching was that you were *redeclaring* i
18:06 imirkin: without having a { } around the case
18:06 karolherbst: nope
18:07 imirkin: it did say something about C99 & co
18:07 karolherbst: compiler defaulting to c89 can't compile envytools
18:07 imirkin: either way, the whole codebase is written c89-style
18:07 karolherbst: it uses c99 features though
18:07 imirkin: bumping the version on account of a trivial-to-fix item like that seems unnecessary
18:07 karolherbst: and seriously, I don'T want to deal with c89 anymore, cause c99 is superioir in every aspect
18:07 imirkin: if, separately, you want to move to C99, then that's a separate discussion
18:07 karolherbst: also gcc defaults to c99 anyway
18:08 imirkin: not my gcc
18:08 karolherbst: seriously, nobody should use c89 anymore today
18:08 imirkin: this is an existing project
18:08 imirkin: C standards shouldn't be flipped between like leaves in the wind
18:09 imirkin: if you were starting a new project, it'd be fine to say "this is C99"
18:09 karolherbst: right, but the current code isn't c89 compatible to begin with
18:09 imirkin: dunno. gcc seems happy.
18:09 karolherbst: some gcc-4.9 wasn't
18:09 karolherbst: newer c enable extension on top of c89
18:09 imirkin: 4.9.3
18:09 karolherbst: *gcc
18:09 imirkin: fine. so C89+extensiosn
18:09 imirkin: wtvr
18:09 karolherbst: right, but did you set it to c89 or gnuc89?
18:09 karolherbst: thaths a huge difference
18:09 karolherbst: right
18:09 imirkin: i didn't set it to anything.
18:09 karolherbst: gnuc89 is pretty much c99 already
18:10 imirkin: i can't tell if you understand what i'm saying or not
18:10 imirkin: the existing code had issues, so i fixed it.
18:10 imirkin: your change was not a fix
18:10 imirkin: it was a policy change.
18:10 karolherbst: right
18:10 imirkin: get mwk to agree with you, and then flip it.
18:11 karolherbst: but it is wrong to say envytools is c89 code
18:11 imirkin: i don't really care either way.
18:11 imirkin: ok - my bad. i meant "gcc default settings"
18:11 imirkin: and if you wanted to add a -stdc=gnuc89 or however you do that, i'd be fine with that.
18:11 imirkin: as it's just documenting the existing situation
18:11 karolherbst: sorry for being so strict about that, but I am personally annoyed whenever I can't use sane c99 features and just default to it
18:11 karolherbst: mhh k
18:12 karolherbst: maybe that would be the better solution
18:12 karolherbst: but I am also against using gnu extensions :p allthough for nouveau it might be fine
18:12 imirkin: separately, i'm not against flipping to c99. or even c11. get mwk to sign off on it, and you're good to go.
18:12 karolherbst: maybe less as bsd begins to move to non gnu cc
18:12 karolherbst: mwk: ^^ https://github.com/envytools/envytools/pull/60
18:17 karolherbst: allthough while we are at that, we might just switch to c11 directly allthough there aren't any "awesome" features in c11 except maybe some threading stuff
18:17 karolherbst: maybe _Generic macros are nice to have
18:21 pmoreau: Mmh… That’s not going to end well: I compute a base address outside the loop, and increment it within the loop to get a new pointer. But, after some optimisation passes, the new pointer computation completely erases the old value…
18:23 karolherbst: pmoreau: do you know which pass breaks this?
18:24 pmoreau: No, but it could possibly be an error in how I set the CFG.
18:24 pmoreau: I’ll paste the whole output.
18:26 mwk: karolherbst: go for it
18:26 karolherbst: when I find a pass messing up things I usually disable all passes running after the breaking one and do a diff with that pass enabled and disabled, usually shows the issue quite well then (except it is something I don't understand like flattening or so)
18:26 pmoreau: https://phabricator.pmoreau.org/P104
18:26 pmoreau: karolherbst: I’ll try that
18:26 karolherbst: mwk: k, will update the thing tomorrow then. Any preferences on using gnu extensions?
18:27 karolherbst: pmoreau: mhh usually debug=3 is enough too ;)
18:27 karolherbst: then it won't print the RA stuff
18:27 mwk: I'm not particularly fond of them, but it seems we have a few of these already anyway
18:27 karolherbst: k
18:27 mwk: so as long as it still compiles with clang...
18:27 karolherbst: so the typeof thing is also fine by you?
18:27 karolherbst: I mean the change to the c99 typeof
18:27 mwk: c99 doesn't have a typeof
18:28 mwk: it's a gnu thing
18:28 karolherbst: __typeof__ should be c99
18:28 pmoreau: karolherbst: moar debug output is obviously better! :-D
18:28 karolherbst: I think
18:28 karolherbst: not quite sure
18:28 mwk: it's a gnu extension
18:29 karolherbst: I see
18:29 mwk: I don't like it very much, but it's not like we can easily get rid of it now anyway
18:29 karolherbst: maybe through the _Generic macro :D
18:29 mwk: maybe, I've never looked at that one closely
18:29 karolherbst: conditional paths depending on type of argument
18:30 karolherbst: inside macro
18:30 mwk: I know that much
18:30 karolherbst: k
18:30 karolherbst: pmoreau: sure :D
18:31 karolherbst: pmoreau: which part is the broken thing?
18:31 karolherbst: line 216+
18:31 karolherbst: where the passes are?
18:31 pmoreau: from line 428 and on
18:32 karolherbst: well, 428+ is ra, I can't handle that stuff :D
18:32 karolherbst: ohh
18:32 pmoreau: Or rather, 428 and above are kinda good, but further down is not.
18:32 karolherbst: you mean the RA thing removes it
18:32 karolherbst: uhh
18:32 karolherbst: RA issues, nice
18:33 karolherbst: so optimized SSA form is nice, but after doing RA it is messed up
18:33 pmoreau: Using the names from the version at line 428: it should not reuse %r77 and %r78 within the loop
18:34 karolherbst: mhh
18:34 karolherbst: %r78 is unsued
18:35 pmoreau: True, we dont care much of %r78, but %r77 is important
18:35 karolherbst: huh
18:35 karolherbst: the merge+split are a noop put together, right?
18:36 karolherbst: basically mov %r76 %r100; mov %r77 %r101
18:36 karolherbst: mhh I meant 77 and 78
18:36 karolherbst: but you get the point
18:36 pmoreau: Right
18:37 karolherbst: the stuff seems right then
18:38 karolherbst: postRA $r4 is the place where the value is stored
18:38 karolherbst: and it doesn't seem to be touched within the loop
18:38 karolherbst: ohh waiot, the loop is somewhere else
18:39 pmoreau: the loop starts in BB:3
18:39 karolherbst: mhh right, I think I see it
18:39 karolherbst: yeah, and ends with BB:6
18:40 karolherbst: k so RA thinks $r4 isn't used anymore
18:40 pmoreau: and $r4 gets rewritten line 1468
18:41 pmoreau: Right, that’s why I’m thinking I must have messed up the CFG, and it thinks it’s a simple if/else rather than a loop.
18:41 karolherbst: yeah, I think I know what that code does now, just not why RA messes up
18:42 karolherbst: pmoreau: which incidents has BB:5?
18:42 karolherbst: or rather
18:42 karolherbst: which ones has BB:3
18:42 karolherbst: should have BB:2 and BB:6
18:43 pmoreau: Right
18:43 pmoreau: I’ll check that
18:44 karolherbst: pmoreau: ohh, I think it looks fine though, because $r4 gets overwritten with $r4 + $r6 + $r2
18:46 pmoreau: No, it gets overwritten with 4 * $r7 == 4 * ($r4 + $r6), with $r6 being the number of iterations
18:46 karolherbst: ohh right, there is a mul, no add
18:47 karolherbst: k, yes, that looks indeed wrong
18:48 pmoreau: So, at iter 1, you get new_ptr = (((base + 0) * 4) + 1) * 4, rather (than new_ptr + 1) * 4
18:48 karolherbst: yeah, I see it now
18:48 pmoreau: er, the second part is wrong, but you got the idea
18:52 karolherbst: the reason it is wrong though (technically) is, that the SSA add first source isn't a result of a phi, so it shouldn't be overwritten in the BB ;) I guess RA indeed fails to see the loop for whatever reason
18:59 pmoreau: BB:3 does have BB:2 and BB:6 as incident (respectively as forward and back edges)
18:59 karolherbst: mhh
18:59 karolherbst: maybe it is missing somewhere else
19:00 karolherbst: sadly we have no easy way to print the incident ones
19:00 karolherbst: allthough we could print them like we do the outgoing ones
19:01 pmoreau: Yes, that's how I know which are the incident ones, I just edited the print function ;-)
19:01 imirkin: i've added that hack several times
19:01 imirkin: (to print incident edges)
19:01 imirkin: iirc there's code there, just commented out
19:02 pmoreau: Oh right, I looked at it but didn’t even realised it was incident edges… --" Well, it was easy enough to copy/paste the outgoing version and tweak it :-D
19:03 karolherbst: why not print it by default?
19:22 pmoreau: My graph theory skills have never been that high, but they were still higher than they currently are… I need to re-learn some stuff
19:23 imirkin: pmoreau: notes that the tree edges tend to be mislabeled
19:24 imirkin: e.g. forward edges are often not really forward
19:24 imirkin: and following just tree edges doesn't hit everything
19:24 imirkin: those labels tend only to matter for iteration order though
19:24 imirkin: so it doesn't super matter for them to be 100% correct
19:26 pmoreau: But might still lead to some issues, like loops not being detected, couldn’t it?
19:27 imirkin: no
19:27 imirkin: it's not _that_ wrong ;)
19:29 pmoreau: Well, except if it decides to overwrite some values it should not
19:29 pmoreau: Because it should be reused as-is in the next iteration
19:30 imirkin: like i said, that should never happen, at least not with the way the tgsi fe sets things up
19:30 pmoreau: fe?
19:31 imirkin: frontend
19:31 pmoreau: k
19:32 pmoreau: Even though I have been looking at nv50_ir_from_tgsi, that does not mean weird things can’t happen in my SPIR-V -> NV50 IR translation
19:32 imirkin: what's the issue exactly?
19:33 karolherbst: imirkin: RA reuses a register used at the top of the loop
19:34 imirkin: unlikely
19:34 karolherbst: it does though
19:34 imirkin: i want to see the output
19:35 imirkin: chances are the input into the RA is somehow illegal
19:35 karolherbst: https://phabricator.pmoreau.org/P104$1432 postRA
19:35 karolherbst: https://phabricator.pmoreau.org/P104$367 preRA
19:35 pmoreau: I'm trying to translate `base = gid * iter; for (i : iter) { store[base + i] = base + i; }`, but I get `base = gid * iter; for (i : iter) { base = (base + i) * 4; store[base + i] = base + i; }`
19:35 karolherbst: $r4 in postRA
19:36 imirkin: loop_lt.spv - can i see that decoded somewhere?
19:36 pmoreau: Sure
19:36 imirkin: i.e. the spirv ops themselves
19:36 imirkin: but textually represented
19:37 pmoreau: imirkin: Added as a comment to the same link
19:40 imirkin: thanks
19:43 imirkin: the loop counter seems fine... what are you saying is mis-RA'd?
19:43 karolherbst: "22: mul u32 $r4 $r2 $r0 (8)" is wrong
19:43 karolherbst: at least I think
19:43 imirkin: how so?
19:44 karolherbst: ohhhhh wait
19:44 pmoreau: $r4 in that case is supposed to store base, but it gets rewritten every iteration with (base + i) * 4
19:44 karolherbst: pmoreau: no, it ain't
19:44 karolherbst: $r6 is the loop variable
19:44 karolherbst: and the *4 was opt away
19:45 karolherbst: I thnk
19:45 pmoreau: mul u32 $r4 $r2 $r0: $r0 = $r7, $r7 = $r4 + $r6, $r2 = 0x4
19:45 imirkin: the mul64 expansion is wrong i think
19:46 imirkin: er wait, maybe it's ok. hold on
19:46 karolherbst: pmoreau: mhh, yeah, maybe it is too late for me :/ or too much wine for today...
19:47 pmoreau: karolherbst: :-) No problem, I comepletely misread from time to time as well.
19:47 imirkin: yeah, i'm like 99% sure the expansion of mul64 is wrong
19:48 imirkin: ​ 35: mad (SUBOP:1) u32 %r110 %r103 %r105 %r108 (0)
19:48 imirkin: 417
19:48 imirkin: er hmmmm
19:48 imirkin: right
19:48 imirkin: at the very least, you don't keep track of a carry
19:49 pmoreau: I should be…
19:49 imirkin: hmmmm
19:49 imirkin: actually is it fine? it might be fine...
19:50 imirkin: yeah. i take that back. it's fine
19:51 pmoreau: It should be fine. It tried a previous version with different values and compared with the CPU. Since then, I just moved it to before RA so that I could create temp variables, in case there is some aliasing on the inputs/output
19:51 imirkin: oh wait
19:51 imirkin: you forgot to multiple the upper terms
19:51 imirkin: doesn't matter here since they're both 0
19:52 pmoreau: Do they matter? They are going to be outside the values stored in a u64.
19:52 imirkin: right. ignore me.
19:54 pmoreau: It took me quite some time to get to the current version, and the initial patch I submitted was quite flawed. :-D
19:59 imirkin: what's the issue btw?
19:59 imirkin: [with this code]
20:03 pmoreau: line 1468: $r4 is reused, even if it shouldn’t, resulting in $r4 = (base + i) * 4 rather than just base
20:04 pmoreau: So, at iter 1, you get new_ptr = (((base + 0) * 4) + 1) * 4, rather (than new_ptr + 1) * 4
20:04 pmoreau: And it only gets worse on subsequent iterations
20:04 imirkin: why shouldn't it be reused?
20:04 pmoreau: *rather than new_ptr = (base + 1) * 4
20:05 pmoreau: Because it changes the behaviour of the code
20:05 imirkin: i mean, the code could be written differently, but why do you say $r4 shouldn't be reused?
20:05 imirkin: it's not a loop variable
20:06 pmoreau: But is is used within the loop to compute new_ptr
20:06 imirkin: oh wait
20:06 imirkin: it is
20:06 imirkin: but there's no phi node??
20:06 pmoreau: It could reuse $r4, as long as it reuses it properly.
20:06 karolherbst: imirkin: because the value is assinged once
20:06 karolherbst: or has it to be a phi node regardless?
20:06 imirkin: ohhhh i see what's going on
20:07 imirkin: the live interval isn't extended to the end of the loop
20:07 karolherbst: right
20:07 imirkin: so... try to find the code that does that
20:07 imirkin: and figure out what you're doing to break it
20:07 imirkin: i'm guessing it's the edge types
20:07 pmoreau: ;-)
20:07 imirkin: and/or the lack of joins
20:08 imirkin: [your edge types are way wrong fyi... not that the tgsi fe's are right]
20:09 karolherbst: isn't the back edge odd within BB:6?
20:09 pmoreau: So my fears about them being wrong wasn’t that of then? :-D
20:09 karolherbst: what is back anyway, never saw that for loops
20:09 imirkin: back is ... "i have a loop"
20:09 imirkin: i wrote up definitions for what all those are SUPPOSED to mean
20:09 karolherbst: I see
20:09 pmoreau: karolherbst: " * - BACK: edges from a node to a parent (or itself) in the spanning tree"
20:10 karolherbst: mhh, make a forward and see if that works
20:10 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_graph.cpp#n339
20:11 imirkin: basically... *every* node in the graph must have at least one inbound tree edge (except the root node)
20:11 pmoreau: why a forward? back seems to be perfect here, since it’s going back to a parent (well, ancestor), isn’t it?
20:11 imirkin: er, s/at least one/exactly one/
20:11 pmoreau: Oh, only one…
20:11 imirkin: forward means that you're going "ahead" in the spanning tree, back means you're going up to a parent, and cross means you're going to another branch
20:12 imirkin: if you don't know what a snapping tree is, please look it up
20:12 pmoreau: I did now, but forgot. I was re-reading about it
20:12 imirkin: hehe
20:13 imirkin: there are 2 main algorithms for building up a MST
20:13 imirkin: (minimal spanning tree)
20:13 imirkin: prim's and kruskal's or something
20:13 imirkin: unless i'm confusing it with convex hull
20:14 imirkin: phew. didn't confuse it.
20:14 imirkin: so what are the convex hull algos called... i forget =/
20:15 imirkin: oh. they don't have fancy names. that's why i don't remember them :)
20:16 pmoreau: :-)
20:16 karolherbst: shouldn't most of the edges be tree nodes in this case?
20:16 imirkin: i remember the angle sorting one... i think there was some other one too
20:17 imirkin: tree edges - yes.
20:17 imirkin: and among other things that the tgsi fe gets wrong is that a if/else is 3 tree edges + a cross
20:17 imirkin: instead of forward edges
20:17 imirkin: however i was too chicken to fix it
20:17 imirkin: sicne i dunno what else it'd break
20:17 karolherbst: also the one in BB:1 can be a forward
20:18 karolherbst: I meant to BB:1
20:18 karolherbst: in BB:7
20:18 karolherbst: in BB:3 both tree
20:19 karolherbst: and I think with that it might work
20:20 karolherbst: and BB:2 -> BB:3 also tree?
20:23 karolherbst: uhhhh
20:23 karolherbst: I think I found a mistake
20:24 karolherbst: BB:4 has a wrong edge
20:24 karolherbst: edge is BB:7, but it does bra BB:1
20:24 karolherbst: I knew it was a mistake to let 7 and 1 look nearly the same
20:24 imirkin: it's right in the SSA form
20:24 imirkin: i think something moves it from one place to the next
20:24 karolherbst: I am looking at post RA
20:25 imirkin: yeah, ignore the post-RA stuff
20:25 karolherbst: ohhh, k
20:25 karolherbst: yeah, makes sense
20:29 karolherbst: imirkin: just trying to understand things, how does that look ? https://gist.github.com/karolherbst/4e83e69e083880986fe250ae0be57db2
20:29 karolherbst: ohh forgot one change
20:29 imirkin: wrt what?
20:29 imirkin: BB:7 -> BB:1 is a CROSS, not a back.
20:29 karolherbst: tree stuff
20:30 karolherbst: ohh okay
20:30 imirkin: visualize the tree
20:30 imirkin: draw it out
20:30 imirkin: then draw all the tree edges
20:30 imirkin: then all the other edges should become immediately apparent what kind they are
20:32 Tom^: if sensors cant read the fanrpm on a msi 780 gaming something, nor does it seem to rampup the fans when its starting to heat up where and what info would be useful to debug it, vbios?
20:33 imirkin: that'd definitely be a start
20:33 Tom^: lol nvm found the issue http://i.imgur.com/gLrzi2i.jpg
20:33 Tom^: the fan is broken xD
20:34 imirkin: "oops"
20:34 karolherbst: well, you have a high speed camera, right?
20:34 karolherbst: you are just trolling us :O
20:35 Tom^: i never troll
20:35 karolherbst: why is bb:7 to BB:1 cross? I really don't see it
20:35 imirkin: cross is what you get when none of the others apply
20:36 karolherbst: isn't it a "normal" tree edge?
20:36 imirkin: actually BB:7 -> BB:1 is a tree edge
20:36 imirkin: tree edges are all the edges in the spanning tree
20:36 imirkin: forward edges are edges to a (deep) child in the spanning tree
20:37 imirkin: back edges are edges to a parent in the spanning tree
20:37 karolherbst: k
20:37 imirkin: cross edges are other edges
20:37 karolherbst: like that then? https://gist.github.com/karolherbst/4e83e69e083880986fe250ae0be57db2
20:38 imirkin: BB3 -> BB5 is tree as well, no?
20:38 karolherbst: ohh right
20:38 karolherbst: orr not
20:38 karolherbst: mhh
20:38 karolherbst: the loop is BB:3 -> BB:5 -> BB:6 -> BB:3
20:39 imirkin: not sure how that's related to anything
20:39 imirkin: step 1: draw spanning tree
20:39 imirkin: i.e. every node except the root has exactly one incoming tree edge
20:39 imirkin: and the root has none
20:39 pmoreau:is reading up some graph course
20:40 karolherbst: okay
20:40 karolherbst: so we have all edges being tree except BB:6 -> BB:3 being a back edge
20:40 imirkin: then, based on that spanning tree, decide whether the remaining edges are forward, back, or cross
20:40 imirkin: that's right
20:40 imirkin: there might be a different spanning tree one could draw, with diff edge classifications. not sure.
20:41 karolherbst: how does a forward edge look like then?, something like BB:4 -> BB:5 -> BB:6 and forward would be BB:4 -> BB:6?
20:41 imirkin: that's right
20:41 karolherbst: k
20:41 karolherbst: I think then I got it
20:41 imirkin: e.g. if you have a plain if
20:41 karolherbst: yeah, I know
20:41 imirkin: in the true case it goes to BB:5, in the false case it goes to BB:6
20:42 karolherbst: like an if without an else
20:42 imirkin: yes.
20:42 karolherbst: k awesome :)
20:42 imirkin: that said, that's not 100% how the tgsi fe labels these
20:42 imirkin: and i don't know how reliant the rest of the system is on exactly how the tgsi fe labels things
20:42 karolherbst: maybe I should do some technical IT studies, because my studies didn't really include that stuff :D
20:42 imirkin: ok, well, that's kinda core algorithms stuff
20:43 karolherbst: right, and my studies were pretty much in the "applied IT" directon
20:43 karolherbst: *direction
20:43 karolherbst: even more than that
20:43 imirkin: and this isn't applied? you're trying to apply it now...
20:43 karolherbst: we didn't even did any assembly
20:43 karolherbst: the "lowest" thing we did was java :D
20:43 karolherbst: well
20:44 karolherbst: we also hadn't anything compiler related as well
20:44 imirkin: in case you're interested, you can find a lot of materials on the MIT OpenCourseWare site (ocw.mit.edu)
20:44 karolherbst: actually we had a little bit of graph theory, but not really much
20:44 karolherbst: yeah, I think I actually need to dig through all that
20:45 imirkin: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/
20:45 imirkin: this was the algorithms class i took back in the day (but a few years prior)
20:45 karolherbst: awesome :)
20:46 karolherbst: well I had some more algorithm stuff in general, just not much about graphs
20:46 imirkin: there are newer versions of those as well - i just clicked on a random one. dunno if the newer ones are necessarily better though.
20:46 imirkin: that CLR book is pretty intense
20:47 karolherbst: good :D
20:47 karolherbst: there are actually videos of all lectures :O
20:47 imirkin: if you want a more practical approach, 2nd edition of sedgewick's book is what i used back in the day too. 3rd edition iirc was bad (and/or never completed)
20:48 karolherbst: mhh
20:48 karolherbst: looking through that list of lecturers
20:48 imirkin: interesting. apparently there's a 4th edition of that now. fancy
20:49 karolherbst: I did everything up to 15 I think
20:49 imirkin: perhaps the 4th edition came out better. i didn't like 3rd one.
21:15 pmoreau: Ok, I improved the picking of edge types, resulting in all edges being tree edges, except for the one between BB:6 and BB:3 which remains a back one.
21:16 pmoreau: That got rid of the branch to the exit, which is nice, but it still rewrites $r4.
21:16 pmoreau: But that will have to do for tonight
21:17 imirkin: RegAlloc::PhiMovesPass::visit
21:17 imirkin: have a look at that
21:17 imirkin: allegedly that should make things better
21:17 imirkin: oh
21:18 imirkin: except not.
21:18 imirkin: hmmmmmmmmmmmmmmm
21:19 infinity0: RSpliet: how's the fermi reclocking going
21:20 imirkin: "no progress" afaik
21:23 karolherbst: well my patches also improve the situation on fermi, there we also had volting issues, allthough engine reclocking isn't enabled yet
21:25 infinity0: you guys know i lent him my card right
21:25 karolherbst: how should we
21:26 infinity0: oh i dunno, i thought you talked with each other and stuff :p
21:26 infinity0: would fermi reclocking be useful for the newer cards too? or is it a whole new task each time
21:26 imirkin: it is sometimes a whole new task
21:27 imirkin: in the case of fermi, it's not particularly useful for non-fermi
21:27 infinity0: ah ok, i guess it would be low-priority then
21:27 imirkin: well, priority rarely has anything to do with usefulness
21:28 imirkin: this isn't a business that's trying to achieve some goal
21:28 imirkin: this is a bug of volunteers working on "fun" things
21:28 imirkin: bunch*
21:28 karolherbst: :D
21:28 karolherbst: allthough reing those memory things should be plenty of fun
21:28 karolherbst: sadly I don't have any fermi gpus anymore
21:28 infinity0: lol, i could also understand "bug"
21:29 imirkin: e.g. i doubt the (admittedly small amount of) work i did of allowing to use nouveau_vieux on a NV34 is going to be exactly useful to a ton of people.
21:29 infinity0: well, sometimes volunteers do find "more useful" stuff also more fun, part of it is having a large audience
21:29 infinity0: i suppose fermi is "getting old"
21:29 imirkin: yeah, depends on the person i suppose
21:30 imirkin: my goal is to actually *reduce* the audience by getting people to buy amd gpu's :)
21:30 infinity0: lol, i will take a note of that for next time :p
21:30 karolherbst: developing for amd ain't fun though
21:31 imirkin: sure
21:31 imirkin: but that has nothing to do with audience size :)
21:31 Calinou: AMD GPUs are worse though...
21:31 karolherbst: right
21:31 Calinou: they don't yet have a Pascal equivalent
21:31 imirkin: Calinou: you mean a gpu without working acceleration?
21:31 karolherbst: Calinou: well "slower" doesn't mean "worse"
21:31 Calinou: karolherbst: also less power efficient
21:32 Calinou: Pascal is king there
21:32 imirkin: Calinou: i think you can match that by just disabling accel on an amd gpu, and it'll be exactly the same :)
21:32 Calinou: and it matters a lot in laptops, which is why the only (good) dGPUs in laptops are NVIDIA
21:32 Calinou: well, nvidia blob works fine, sorry to disappoint you
21:32 Calinou: if you're only here for working things, the obvious answer is using the blob
21:32 imirkin: so does windows, i assume
21:32 karolherbst: well, gpus and laptops are a bad combination anyway
21:32 imirkin: not sure what your point is
21:32 imirkin: nvidia blob is a non-starter, just like windows is a non-starter
21:33 infinity0: nvidia blob kept on crashing my X every 20~+ hours, that's why i switched to nouveau
21:33 Calinou: my studies force me to use a proprietary OS anyway
21:33 Calinou: infinity0: the unreliable suspend of nvidia blob is sad to see, too
21:33 Calinou: that's also the kind of thing that makes me use Windows/macOS :/
21:33 Calinou:doesn't feel like pointing to Narod's "Linux on the desktop" problems list :)
21:33 imirkin: you can point at problems all you want
21:34 infinity0: yes offtopic and heard-it-before-already etc
21:34 imirkin: if your goal is to have open-source code running on your cpu, then any blob is a non-starter
21:34 imirkin: which i think is the goal of many linux desktop users
21:34 infinity0: in 10 years we'll have rust desktop environment and graphics drivers and the world will be whole again
21:34 infinity0: also gnu/rust-hurd
21:34 imirkin: lol
21:35 karolherbst: lol
21:35 Calinou: http://node-os.com/
21:35 Calinou: this is the future
21:35 imirkin: rust is kind of a joke... the first thing it tries to do is download an update off the web. not exactly something you want in a compiler :)
21:35 karolherbst: nodejs is a joke as well
21:36 imirkin: (or maybe it was its build system that tried to download stuff off the web... either way it was a non-starter)
21:36 Calinou: a build system acquiring dependencies from the web is a smart thing
21:36 Calinou: if you believe this is bad, apparently you've never used Windows
21:36 karolherbst: no
21:36 karolherbst: it ain't
21:36 Calinou: ok, compile (this random program) for me for Windows natively
21:37 imirkin: let me guess - you think maven is a good idea too?
21:37 Calinou: you have 20 minutes
21:37 Calinou: yes
21:37 infinity0: the rust build system is being improved yes, don't let that put you off the rest of the language
21:37 Calinou: anything that saves developer time is a good thing
21:37 Calinou: "good code" is a myth, stop fighting for it :)
21:38 imirkin: you know what saves me time? when i have randomly changing code breaking my software.
21:38 imirkin: that's a HUGE time-saver
21:38 imirkin: er wait. i meant waster. huge time-waster.
21:38 karolherbst: well you can always hard code the version you want !11!!1!1
21:38 Calinou: you can pin versions... that's the recommended use of npm
21:38 imirkin: at which point... the thing becomes useless
21:38 Calinou: semantic versioning is there for a reason, too
21:38 infinity0: rust is more stable these days, and yes that's why i hate npm too
21:38 imirkin: i should just be bundling the thing with my software in the first place.
21:38 infinity0: they don't "get" versioning and are also insanely arrogant about it
21:38 karolherbst: bundled libarries is the worst thing... ever
21:38 Calinou: static linking is another option, not a very elegant one but it works
21:39 karolherbst: like ever ever
21:39 Calinou: (or bundling)
21:39 karolherbst: I am sooo happy, software usualy does not bundle openssl
21:39 infinity0: pinning versions is equivalent to forking projects for every version
21:39 Calinou: karolherbst: do you want me to point to Narod's Linux problems list? ;)
21:39 karolherbst: Calinou: bundling is bad, get over it
21:39 Calinou: the fact that most Linux software doesn't run as-is on distros is really off-putting for users
21:39 imirkin: perhaps those users are better off on windows
21:39 imirkin: fact is - diff environments have their diff quirks
21:40 Calinou: I'm happy we have AppImage/Snap/Flatpak
21:40 karolherbst: if some security issue is found within openssl, I don't want to wait until 50 packages ship updates to fix the same issue
21:40 Calinou: they're the superior compiler (:D) for desktop users
21:40 imirkin: if your requirement for fixing "linux problems" is to make it work like windows, then you've missed the whole point
21:40 karolherbst: seriously
21:40 infinity0: i've lost track of what you guys are arguing about now :/
21:41 imirkin: infinity0: probably for the best.
21:42 karolherbst: Calinou: also that list is semi right, the big danger is, that most of the points somehow sound sane, but ain't. Of course you can always list the difference between linux and windows and say linux is shit because there are differences
21:42 karolherbst: that's exactly what the list is
21:42 karolherbst: there are some valid points, but that list is overhyped
21:42 imirkin: just reuse the same list, rename it to "Problems with windows", and flip each sentence around.
21:42 Calinou: don't be different for the sake of being different
21:43 Calinou: imirkin: Windows works perfectly with suspend. Scandalous! This must be broken.
21:43 Calinou: ;)
21:43 karolherbst: suspend works for me too
21:43 imirkin: Calinou: now you got it!
21:43 karolherbst: so what
21:43 Calinou: actually it doesn't for me, on my desktop
21:43 Calinou: but you get the point
21:43 Calinou: network breaks after a resume
21:43 Calinou: (despite it being Ethernet)
21:43 imirkin: anyways, suspend almost always works
21:43 imirkin: resume can be an issue
21:43 karolherbst: Calinou: REed driver or fully supported by producer?
21:43 imirkin: :)
21:44 Calinou: karolherbst: Windows 10
21:44 karolherbst: I see
21:44 karolherbst: well there you go
21:44 Calinou: stopped using Linux bare metal :(
21:44 karolherbst: suspend broken for you on windows 10, suspend works without issues for me on linux ;)
21:44 Calinou: as long as suspend works on my laptop I'm happy
21:44 Calinou: (it does)
21:44 karolherbst: the point is, you can create an equally long list for windows
21:45 Calinou: Windows is very good for users, and is starting to become decent for devs
21:45 karolherbst: "good"
21:45 Calinou: seriously, I don't have many things to complain about it
21:45 Calinou: the font rendering, maybe...
21:45 karolherbst: I sure want my OS to send my contact list through the internet
21:45 Calinou: (I prefer macOS or Linux's font rendering by far)
21:45 karolherbst: as well as my internet history :O
21:45 Calinou: karolherbst: I disabled tons of things already
21:45 imirkin: Calinou: go get a refund.
21:46 karolherbst: Calinou: ohhh now you have to customize windows to get it to work, just like linux has to be customized ;)
21:46 Calinou: all OSes have to be customized to work anyway
21:46 karolherbst: right
21:46 karolherbst: well
21:46 Calinou: sane defaults are hardly a thing in software :)
21:46 karolherbst: well mac os x has somehow sane defaults
21:46 Calinou: yeah they're surprisingly sane
21:47 karolherbst: well their selling point is user interface anyway
21:47 imirkin: Calinou: http://www.dourish.com/goodies/see-figure-1.html - enjoy.
21:47 karolherbst: :D
21:49 Smjert: Calinou: you might want to get a firewall too... a hardware one though...
21:49 Calinou: Smjert: no
21:49 karolherbst: those backdoors though
21:49 imirkin: 5. Error logging. Ignore it. Why give yourself an ulcer?
21:49 karolherbst: hihi
21:50 Calinou: karolherbst: did people actually lose money, died or were injured due to those?
21:50 karolherbst: Calinou: sure
21:50 karolherbst: guess for what they are sued?
21:50 karolherbst: to find people to kill or "put in jaiil"
21:50 karolherbst: what did you think? :D
21:51 Calinou: no proof, no clicks
21:51 karolherbst: ahh so they do it for fun?
21:51 karolherbst: I figured
21:51 karolherbst: must be hard to find bad people this way without putting them into jail :O (or kill them)
21:52 karolherbst: I won't go into an argument over this though, because everything else is soo naive, I can't handle that anyway
21:52 Calinou: in 8 minutes you'll be able to talk about this
21:53 Calinou: :-)
21:55 karolherbst: I should work more on my presentation...
22:34 mooch: das me
22:35 karolherbst: mooch: are those registers already within rnndb?
22:43 mooch: nope
22:43 mooch: well, some of them are
22:43 mooch: my focus is on human-readable docs tho :)
22:51 karolherbst: right, but we should have that added within rnndb as well
22:51 karolherbst: a lot of tools are using it
23:55 mwk: mooch: thanks for the PR, I'll have a few comments