00:01 calim: yeah flattening ignores the code because there is no conditional
00:02 calim: can you just safely delete the joinat/join yourself ?
00:02 karolherbst: I was already thinking about this, but I wanted to let flattening do this
00:03 karolherbst: but if an empty branch is also empty when it just contains bra/join/joinats
00:03 karolherbst: then I could remove those in empty bbs
00:03 karolherbst: but mhh
00:03 karolherbst: then I have to lookup where the joinat points to
00:03 karolherbst: because the join might be usefull
00:07 calim: that's why the flattening pass only deals with simple if-else-endif cases
00:09 calim: either way you have to add extra code to check if the situation allows for removal of the joinat/join
00:09 karolherbst: calim: when the first instruction of a BB is a join and the last instruction of the "previous" BB is the joinat, those can be removed, right?
00:10 karolherbst: or is there a more realxed rule I can use?
00:11 imirkin_: if there's a direct path from BB A -> BB B, you can nuke the joinat/join pair. it's only if there are conditional branches in between.
00:12 karolherbst: ahh okay
00:12 calim: yes ... joinat BB:n, join == bra BB:n
00:12 karolherbst: well I nuke empty branches in between away, so I should detect those
00:13 karolherbst: imirkin_: so if there is an edge connection joinat.bb and join.bb directly, those can be nuked?
00:13 calim: with BB:n from whichever the last joinat you encountered was
00:13 karolherbst: *connecting
00:13 imirkin_: karolherbst: pretty much yeah
00:13 calim: the join stuff is only useful for divergent branches
00:13 karolherbst: okay
00:13 karolherbst: that should be easy then
00:19 karolherbst: imirkin_: wouldn't that fit more in something like DCE? or some other pass, because nuking useless joinat/join away makes sense anyway. Or are those only produced when actually needed?
00:21 imirkin_: those tend to only be produced when needed
00:21 imirkin_: if you're going around removing blocks, that can mess things up
00:30 karolherbst: k
00:46 rardiol: my laptop just sort of hanged. after rebooting, I have this in the logs: http://nixpaste.lbr.uno/raw/qR78hK41 . looks like a nouveau problem?
00:46 imirkin_: yeah. it's an unsolved, undiagnosed problem
00:47 imirkin_: May 06 00:31:17 kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 3 [X[781]] get 002000c8a8 put 002000c8b4 ib_get 000002e3 ib_put 0000036d state 80004610 (err: INVALID_CMD) push 00400040
00:47 imirkin_: that happens sometimes
00:47 imirkin_: no idea why or how to fix it.
00:50 rardiol: imirkin_: me even less, but thanks for the info. Can I do something about it?
01:11 imirkin: well, this particular issue only happen on nv50-era gpu's... but beyond that i don't know much about it
01:14 rardiol: I just checked, I seem to have gotten it quite a few times already. Don't remember if I just rebooted and didn't bother checking the cause or what.
01:15 rardiol: There's some errors involving DMA_PUSHER, but the surroinding nouveau errors are different
01:15 imirkin: sometimes those errors kill, other times they're harmless
01:16 rardiol: ah, that explains why I didn't see it.
01:16 imirkin: the surrounding errors tend to be a direct result of that pusher error
01:24 rardiol: but I get some errors before DMA_PUSHER
01:29 rardiol: would extra loggin give some useful info? or just the logs I have now?
01:31 imirkin: sorry, no clue
08:59 pq: I got an email for mmiotrace's printing on the PCIDEV lines, that it may need some fixing. Who wants to get CC'd with my reply?
09:00 pq: I already have karolherbst. How about mwk? skeggsb?
09:00 pq: who's been working with mmiotrace, more importantly, written tools to parse the traces?
10:05 mwk: pq: give me that CC
10:06 pq: mwk, cool, what's your email?
10:13 mwk: koriakin@0x04.net
10:16 pq: mwk, thanks, sent. Decided to cc Ben too.
10:36 karolherbst: pq: already replied without looking here first :D
10:36 karolherbst: pq: well I think it would be nice to have mmiotrace working on ppc too though
10:37 karolherbst: and I think this would be the only archs we care about now? x86 and ppc.
10:37 karolherbst: No idea if nvidia gpus might work on others
10:39 karolherbst: pq: maybe it might make sense to add it which projects heavily rely on that tool, just so that we get notified faster about changes, but... changes never really happen there :/
10:40 karolherbst: allthough it is a great tool
10:40 RSpliet: karolherbst: if you want to get cracking on Tegra, I'd go for ARM and ARM64 too :-P
10:41 karolherbst: RSpliet: were is the closed nvidia driver?
10:41 karolherbst: or is there one?
10:41 RSpliet: .. mostly in user-space :-D
10:41 karolherbst: right ... :D
10:42 karolherbst: anyway, getting tegra documentation shouldn't be the problem
10:42 karolherbst: and if so, we force gnurou to deal with the issue
10:42 RSpliet: well... depends on what you want to know
10:42 RSpliet: but hmm... I don't think envytools works with platform devices, does it?
10:42 karolherbst: mhh
10:42 karolherbst: no idea
10:43 karolherbst: are those mcp79 ethernet cards normal PCI devices?
10:43 karolherbst: "cards"
10:44 RSpliet: yes
10:44 karolherbst: yeah, then I have no clue
10:44 RSpliet: I think I tried it on my Uni's K1 a while ago, didn't work
10:44 karolherbst: could be easy to fix though
10:45 karolherbst: in either case, that doesn't matter much for mmiotrace anywy
10:46 karolherbst: well I think it would really make sense to get it working for ppc though
10:46 karolherbst: maybe not so much today, but maybe ppc desktops will be a thing again
11:24 dcomp: So I've been comparing my mmiotrace with the other gm108
11:25 dcomp: PSTRAPS.STRAPS0_PRIMARY RAMCFG is 0x2 on mine and 0x4 on the other
11:51 karolherbst: dcomp: if you want to find out what goes wrong with memory reclocking on your ddr3 gm108 you should take a look at the SEQ scripts
11:52 RSpliet: ramcfg is a relatively arbitrary number that only has meaning within the context of your VBIOS
11:52 RSpliet: I'm guessing the two VBIOSes differ as well
11:57 karolherbst: RSpliet: the problem is, that this will be a really minor difference between kepler and maxwell :/
11:57 karolherbst: RSpliet: maybe even something stupid as the issue with gddr5 earlier
11:57 karolherbst: I couldn't find any obvious difference between the kepler and maxwell ddr3 traces
11:57 karolherbst: just the random card varriance
11:59 karolherbst: dcomp: what might help is tracing nvidia controlled (use marks to mark your reclocking) and then trace nouveau and reclock too
12:00 karolherbst: and maybe those generated SEQ scripts in both traces differ in a fundamantal point
12:00 karolherbst: usually they should be quite close
12:11 pq: karolherbst, btw. maybe you could propose adding the nouveau@ list into MAINTAINERS for mmiotrace or something like that, so people would know to CC it.
12:11 karolherbst: pq: that would make actually sense, right
12:11 pq: I don't really know what the kernel practice is, though
12:13 karolherbst: well if you are in MAINTAINERS you have to be aware that users come to you and complain about broken stuff :p
12:13 karolherbst: I think
12:14 pq: I mean, is there a place for a mailing list in MAINTAINERS or is it just people
12:14 karolherbst: everything goes
12:14 karolherbst: L: for lists
12:14 karolherbst: M: for people I think
12:14 pq: cool
12:14 karolherbst: or the maintainer
12:15 karolherbst: yeah
12:15 karolherbst: multiple L: entries seem to be fine
12:15 karolherbst: thought I think at least one M: might be needed
12:15 karolherbst: Orphan ones have no M sometimes
12:16 pq: isn't mmiotrace orphaned? :-)
12:16 karolherbst: nope
12:16 karolherbst: issues get fixed as you see :D
12:17 karolherbst: maybe officially orphaned
12:17 karolherbst: but we do care a lot about it
12:17 pq: sure, but as in, is anyone really a maintainer, to want to review the patches touching it?
12:18 karolherbst: I think we would gladly do it
12:18 karolherbst: it would be really bad if we do a kernel update and out of the sudden mmiotrace doesn't work anymore
12:18 karolherbst: so we want to at least test the patches
12:19 pq: indeed, up to you :-) I won't have time for that.
12:19 karolherbst: I think adding the nouveau ML makes sense
12:19 karolherbst: then somebody will maybe pick it up
12:27 karolherbst: pq: I think I will write Ingo and Steven about this, because they seems to be mostly involved with the tracers in general
12:30 pq: a good idea
14:10 dcomp: anyone got an mmiotrace of gm107
14:12 RSpliet: karolherbst: that could simply mean that the Kepler code is incomplete - but only on a feature that wasn't used in practice
14:13 RSpliet: (like "DLL")
14:13 karolherbst: yeah, most likely
14:26 karolherbst: I think I am done with the messy pass by the way :)
14:42 mupuf: so, the jetson regressed very badly
14:42 mupuf: now, it seems like a fence never gets signaled
14:43 mupuf: the funny thing is, when I the same version of mesa and the kernel as a month back, it still does not work
14:43 mupuf: let's see if there were libdrm patches
14:56 karolherbst: yay
14:56 karolherbst: total instructions in shared programs : 2604639 -> 2583239 (-0.82%) and total gprs used in shared programs : 357515 -> 356476 (-0.29%)
14:56 karolherbst: and one hurt
14:56 karolherbst: ...
14:57 karolherbst: helped: 574 gpr 1595 inst/bytes
14:57 karolherbst: and no shader affected in the official shader-db ...
15:10 karolherbst: ohh mostly saints row benefits from this
15:11 karolherbst: wow
15:11 karolherbst: total instructions in shared programs : 329818 -> 313198 (-5.04%) and total gprs used in shared programs : 33615 -> 32635 (-2.92%) in saints row 3/4
15:11 mupuf: that is ... sizeable
15:11 karolherbst: yeah
15:11 mupuf: especially since the join instructions may be extra costly
15:11 karolherbst: I don't remove joins
15:12 karolherbst: this is just useless crap
15:12 karolherbst: like
15:12 karolherbst: if (condition) { //empty } else { //empty }
15:12 karolherbst: nothing else
15:12 karolherbst: and the condition gets DCEed away after my pass ran
15:12 mupuf: I see, they use a lot of macros I guess :D
15:12 karolherbst: and sometimes it nukes 10% of a shader
15:12 mupuf: DCE?
15:13 karolherbst: dead-code-elimination
15:13 mupuf: ack
15:17 mupuf: #define NOUVEAU_FENCE_MAX_SPINS (1 << 31) <-- that sounds overly long to me, but also does not tell anything about the actual time it takes. I would say something like a minute would be nice
15:17 mupuf: on the nvea, it takes something like 5 minutes to expire
15:18 imirkin_: yeah, it's an incredibly long time
15:18 imirkin_: and it's never supposed to ever ever happen. ever.
15:18 imirkin_: ever. never.
15:18 imirkin_: never ever.
15:18 mupuf: yep, I know, I am trying to figure out why this is happening
15:19 imirkin_: that means we messed something up while attempting to write the fence
15:19 imirkin_: or the gpu is hung
15:20 mupuf: imirkin_: the kernel would say something, right?
15:20 mupuf: and FYI, running wflinfo is enough ... 50% of the time
15:20 imirkin_: should yea
15:20 imirkin_: lol
15:21 imirkin_: well THAT could be something a lot more harmless
15:21 imirkin_: since wflinfo probably doesn't do a lot of rendering
15:21 mupuf: right
15:21 imirkin_: so that could be us doing something dumb
15:21 mupuf: piglit tests are all hitting it though
15:21 imirkin_: ah ok
15:21 mupuf: I am using GBM though
15:22 mupuf: wait a sec, maybe last time, I did run something before runnign piglit
15:22 mupuf: like, a KMS demo or something
15:22 imirkin_: one reason this could happen is if the gpu's writes aren't visible on the cpu
15:22 imirkin_: due to some cache screwup
15:22 imirkin_: although i thought gnurou fixed that
15:22 mupuf: yeah, but I would not expect that to happen on a 4.5 kernel
15:23 imirkin_: not a kernel thing
15:23 imirkin_: it has to do with how the fence bo is created
15:23 mupuf: well, running kmscon was enough to hang the machine
15:23 imirkin_: sure, if fences never updated pretty much everything gets sunk
15:24 mupuf: yeah, but why is my ssh access going down then ;)
15:25 imirkin_: my tk1 hangs after a few minutes usually
15:25 imirkin_: (without even loading nouveau)
15:25 mupuf: mine has been very stable
15:26 mupuf: until yesterday
16:03 drathir: guys 110'C for passive gpu is critical?
16:04 drathir:wonder if that have any thermal emergency shutdown when too high temp...
16:05 drathir: G84
16:07 imirkin_: yes, that's high
16:07 mupuf: drathir: yes, it is close to the absolute max which is supposed to be 132°C in most G84
16:08 mupuf: clean your damn GPU and make sure there is an airflow in your machine ;)
16:20 drathir: mupuf: imirkin_ thanks a lot... that mean or gpu damaged or no thermo pasta onboard i guess...
16:21 drathir: mupuf: its passive clean one.. mb have 46'C
16:21 karolherbst: drathir: listen to what mupuf said
16:21 karolherbst: drathir: air flow.. very important
16:23 drathir:guess need attach some hand maded fan i guess... or there is really no pasta left....
16:25 karolherbst: drathir: well maybe you can plug in your GPU in a different slot?
16:25 karolherbst: drathir: usually a fan providing airflow through the case should be enough
16:26 drathir: karolherbst: honestly even not sure if that card is fully functional...
16:28 drathir: karolherbst: its card survived the go to trash scenario...
17:00 karolherbst: anybody want ot maintain the mmiotracer?
17:01 karolherbst: I could, but I have less knowledge about it than pq and pq doesn't want to :p
17:02 pq: I've replaced all my kernel knowledge with Wayland and Weston :-)
17:02 karolherbst: :D
17:02 karolherbst: k
17:33 dcomp: is the mmiotracer meant to trace the current instruction pointer?
17:34 imirkin_: karolherbst: did you get this when building the new apitrace +qt5 ? http://hastebin.com/juzecuhuvo.php
17:36 drathir: karolherbst: mupuf few min fan equipped 42'C ;p
17:36 mupuf: karolherbst: you know better than anyone here
17:36 karolherbst: imirkin_: qt5.6?
17:37 drathir: but fresh after restart for sure go up ;p
17:38 karolherbst: mupuf: meh, and I think I know like 20% :/
17:38 mupuf: drathir: well, now it is really cold for a G84
17:38 karolherbst: dcomp: well I have a silent coold mcp79 in my mac mini and it usually reaches 90°C in idle
17:40 dcomp: drathir: ^
17:41 drathir: mupuf: for sure should go up, its fresh booted up...
17:47 drathir: karolherbst: mupuf thats fan is from amd gpu fabric cooling system... performance shouldnt be so bad in theory...
17:48 drathir: amd gpu/amd cpu*
17:52 imirkin_: karolherbst: i think 5.5
17:53 imirkin_: karolherbst: 5.5.1
18:03 karolherbst: imirkin_: well with current master I don't get any build failures
18:03 karolherbst: imirkin_: ohh you mean the ebuild?
18:08 drathir: karolherbst: mupuf thanks a lot for help...
18:23 imirkin_: karolherbst: yes
18:23 imirkin_: karolherbst: the 7.1 ebuild
18:30 karolherbst: imirkin_: compiled just fine here
18:30 imirkin_: hrmph
18:30 imirkin_: against qt 5.5?
18:30 imirkin_: or 5.6?
18:31 karolherbst: 5.6
18:34 imirkin_: hrm... looks like it's going down the QT_OPENGL_ES_2 path
18:35 karolherbst: imirkin_: right, you shouldn't build qt5 with egl because of stupid reasons
18:35 imirkin_: ah.
18:36 karolherbst: at least in older version turning on EGL meant turning of OpenGL
18:36 karolherbst: and EGL meant GLES
18:36 imirkin_: got it.
18:37 karolherbst: mhh maybe egl was fine in qtgui,
18:37 karolherbst: have to check
18:37 karolherbst: imirkin_: the qtopengl ebuild does this: "-opengl $(usex gles2 es2 desktop)"
18:39 karolherbst: maybe I had compile issues somewhere with qtgui[egl] and qtopengl[-gles2]
18:39 karolherbst: but there was something funky
18:41 imirkin_: heh
18:41 karolherbst: well I am 100% sure that qtopengl[gles2] is bad :D
18:42 imirkin_: yeah
18:42 imirkin_: i'm doing -gles2 on both qtgui and qtopengl
18:42 karolherbst: right
18:42 karolherbst: maybe the EGL thing is fixed by now
18:42 imirkin_: and hopefully that'll fix it all right up
18:42 imirkin_: there is no egl on qtopengl
18:42 karolherbst: qtgui
18:42 imirkin_: ah, i have it on for that one
18:42 karolherbst: yeah, but it should be fine I think
18:43 karolherbst: maybe some bad deps
18:43 karolherbst: I know there was a reason I disabled it
18:43 karolherbst: can't remember anymore
18:43 imirkin_: wtvr
18:47 karolherbst: imirkin_: what parameters are the best to just check opengl rendering stuff in piglit?
18:48 imirkin_: not sure what you mean
18:49 karolherbst: ./piglit run -p glx gpu
18:49 karolherbst: I think
18:49 karolherbst: just to test changes in codegen
18:49 imirkin_: https://people.freedesktop.org/~imirkin/
18:49 karolherbst: ahh
18:49 karolherbst: thanks
18:50 karolherbst: I test without -1 :)
18:50 imirkin_: you like to live dangerously i guess
18:51 karolherbst: does reclocking speed it up somewhat?
18:53 karolherbst: imirkin_: do you know if piglit allows you to add a regular expression to filter out dmesg messages?
18:53 karolherbst: so that piglit doesn't think it has anything to do with the run
18:53 imirkin_: don't think so
18:53 imirkin_: at least i'm not aware of it
18:53 karolherbst: okay
18:53 karolherbst: because linux is stupid and always writes into dmesg if my wifi changes the channel..
18:54 karolherbst: or the channel width
18:54 karolherbst: ...
18:55 imirkin_: i think those messages are ignored
18:55 imirkin_: info messages are, i think
18:56 karolherbst: okay
18:57 karolherbst: what is streaming-texture-leak for?
19:23 RSpliet: <rant>
19:24 karolherbst: imirkin_: how fast did parallel piglit runs hang the GPU for you?
19:24 RSpliet: on this entire planet there is only *ONE* datasheet mentioning the lo and hi CL/CWL selection bit in EMRS2
19:25 karolherbst: :D
19:25 RSpliet: guess which ones are on my one GDDR3 card... ugh
19:25 RSpliet: produced by a fine German company, bankrupt since 2009
19:25 karolherbst: :D
19:25 RSpliet: they deserved it...
19:25 RSpliet: </rant>
19:26 karolherbst: RSpliet: but you mean a "physical" datasheet? because otherwise "one" makes hardly sense :p I am sure nvidia also has some
19:42 RSpliet: ah there we go... I should archive all these datasheets
19:43 imirkin_: karolherbst: i dunno... kinda random. allegedly ben says it doesn't hang his gpu's much anymore
19:44 imirkin_: RSpliet: which gpu?
19:45 RSpliet: imirkin_: NVA3/2
19:46 RSpliet: although my NVA0 doesn't have said bit
19:46 RSpliet: (just pulled that monster apart)
19:51 imirkin_: heh
20:12 karolherbst: running saints row 4 with the gallium_hud was really interessting though. I managed to half the executed instructions without affecting the fps at all...
20:36 karolherbst: imirkin_: if you want to take a look at this (currently running piglit, but it looks good so far): https://github.com/karolherbst/mesa/commit/1b2c6f0fab00fc34a79c079f6d6f7cf509edd48c
20:37 imirkin_: karolherbst: i think you want visit(BasicBlock *bb)
20:37 karolherbst: yeah I saw it already
20:38 karolherbst: I needed this earlier with a slightly worse design
20:38 imirkin_: :)
20:38 karolherbst: but rebindFlowInstructions is really expensive
20:38 karolherbst: I think I only want to run it once
20:38 karolherbst: and splitting into two passes doesn't make sense
20:39 imirkin_: i think there's a wildly simpler way to do it
20:39 imirkin_: i'll have a look later
20:39 karolherbst: I hope there is
20:39 karolherbst: because all my simplier approaches failed somewhere
20:40 karolherbst: okay, thanks
20:43 hakzsam_: karolherbst, maybe you can have a look at the branch perf counters to see if your pass improves efficiency :)
20:44 karolherbst: hakzsam_: nouveau has bigger issues with saints row :D
20:44 karolherbst: hakzsam_: but yeah, I planned to
20:44 hakzsam_: like what?
20:44 karolherbst: hakzsam_: cutting instruction in half in GALLIUM_HUD improved performance by... <5%
20:45 karolherbst: *count
20:45 karolherbst: so I just disabled a bunch of graphical settings to reduce the invoked instructions, but the perf didn't change much
20:45 hakzsam_: yeah okay
20:46 karolherbst: hakzsam_: but my pass for saints row 3/4 shaders: total instructions in shared programs : 329818 -> 313198 (-5.04%) and total gprs used in shared programs : 33615 -> 32635 (-2.92%)
20:46 hakzsam_: that removes unused instructions in shaders, cool
20:46 karolherbst: yeah
20:46 karolherbst: this was the main goal :D
20:46 hakzsam_: sure I know
20:47 karolherbst: sadly no change in public shader-db
20:47 karolherbst: and usually only eon based games are affected
20:48 hakzsam_: that's a good start for eon games which are really badly supported with nouveau anyway
20:48 karolherbst: ohh wait
20:48 karolherbst: totally untrue
20:48 karolherbst: antichamber, shadow_warrior, borderlands...
20:48 karolherbst: even heaven and valley
20:48 karolherbst: and tomb raider
20:48 hakzsam_: all of them are from eon?
20:48 karolherbst: nope
20:48 karolherbst: payday 2
20:49 karolherbst: these are affected games by my pass
20:49 karolherbst: ohh wait
20:49 karolherbst: I compared wrong files...
20:49 hakzsam_: :)
20:49 karolherbst: ohh well
20:49 karolherbst: but I now that valley and heaven have a few affected shaders
20:49 karolherbst: and shadow warrior too
20:50 karolherbst: but not much
20:50 hakzsam_: anyway, removing unused flow instructions like branches are useful even if this doesn't really improve performance
20:50 karolherbst: anyway, I hoped it would increase the perf in eon based games, but it somewhat didn't noticeable
20:50 karolherbst: hakzsam_: nope
20:50 karolherbst: hakzsam_: my pass didn't do this
20:50 karolherbst: flattening already removed those
20:51 karolherbst: it just takes care of this before RA is done
20:51 karolherbst: and throws a DCE in too
20:51 hakzsam_: your pass removes empty branches, right?
20:51 karolherbst: hakzsam_: before RA, right
20:51 karolherbst: hakzsam_: example
20:51 hakzsam_: yeah, that what I said
20:52 karolherbst: hakzsam_: if (condition) { //nothing } else { // nothhing }
20:52 karolherbst: condition can be a result of many instructions
20:52 karolherbst: so
20:52 karolherbst: flattening removed this if/else/endif away
20:52 karolherbst: and then condition is dead code
20:52 karolherbst: but it is still computed
20:52 karolherbst: so we could have a simple PostRA-DCE pass dealing with this
20:53 karolherbst: (this was the way how I found those cases)
20:53 hakzsam_: oh okay
20:53 karolherbst: but doing this PostRA, doesn't help much with GPR count ;)
20:53 hakzsam_: sure, won't help a lot :)
20:53 karolherbst: and some shader are like 20% smaller with this
20:54 karolherbst: so yeah, the pass eliminates empty branches, but it does so, that we can DCE conditons of conditional branches away
20:55 hakzsam_: makes sense
20:55 karolherbst: had a lot of fun with it
20:55 karolherbst: :D
20:56 karolherbst: like joinats 0x3fffff ...
20:56 karolherbst: in the generated binary
20:56 karolherbst: because joinat BB:5 is broken when BB:5 isn't reachable anymore
20:58 hakzsam_: yeah, you also need to remove those joinats when they don't rely to a reachable BB
20:58 karolherbst: yeah, I already do that
20:59 karolherbst: there are some shaders which produced like 150BBs and had 200 instructions
20:59 karolherbst: and most of those BBs got joinat/join pairs
20:59 hakzsam_: yeah I see
21:00 karolherbst: hakzsam_: and I have to create new edges if I remove empty BBs in between and have to change the target of the flow instructions :)
21:00 hakzsam_: right
21:38 karolherbst: huh
21:38 karolherbst: 2 piglit fails less with my pass
21:38 imirkin_: which ones
21:38 imirkin_: (some piglits are flakey)
21:39 karolherbst: creating a report
21:39 karolherbst: glean.fbo
21:40 karolherbst: and ext_framebuffer_multisample.no-color 8 depth single
21:40 karolherbst: ohh wait
21:40 karolherbst: the latter is a dmesg-fail now
21:40 karolherbst: but that's my fault
21:40 karolherbst: so, no regression through my pass :)
21:41 karolherbst: yet
21:41 imirkin_: some of the ms depth ones are flakey
21:57 karolherbst: okay
22:32 mwk: well
22:33 mwk: I just pulled out a few beers, cloned llvm git, and did mkdir lib/Target/Falcon
22:33 mwk: let's see what happens
22:43 airlied: mwk: drink more beers, write code, sleep, wakeup, forget how you wrote the code or how it works
22:44 mwk: hmm
22:44 mwk: airlied: sounds like a plan
22:44 mwk: oh yay, I get to pick a target triple
22:44 mwk: falcon-unknown-unknown
22:45 karolherbst: mwk: but it would be really awesome to have a C compiler for falcons :)
22:45 mwk: or should I have separate targets for the various versions of falcon
22:45 karolherbst: and then we rewrite everything in C and the binaries are like 50% smaller :D
22:45 mwk: I mean, it's not like you can actually mix v3 with v5 code
22:45 karolherbst: mwk: right
22:45 karolherbst: the ISA is different
22:46 mwk: I'm thinking three targets
22:46 mwk: falcon0, falcon3, falcon5
22:46 karolherbst: mwk: do you know what? maybe we should also add test for every instruction
22:46 karolherbst: mwk: I doubt it will work like that
22:46 mwk: v4 is really v3 with some additions
22:46 mwk: but then
22:46 mwk: you know, I've been thinking *a lot* about llvm for Falcon in the last weeks
22:46 karolherbst: maybe start with v3 first
22:46 mwk: and I'm thinking we could go with 16-bit void* for v3, 32-bit for v4
22:47 karolherbst: because that makes the most sense
22:47 imirkin_: that won't be confusing at all
22:47 imirkin_: why not use sign-magnitude and 38-bit words
22:47 karolherbst: :D
22:47 imirkin_: er, 36 i guess
22:47 karolherbst: mwk: can we have... no pointers? :D
22:47 mwk: imirkin_: well, if you depend on void * size, you deserve whatever happens...
22:48 imirkin_: eh... people depend on it being 4 or 8, i think
22:48 mwk: the natural pointer size for Falcon is 32-bit for v4 and up, no questions
22:48 imirkin_: although i guess regular ol' 16-bit code had it as 16 bits...
22:48 mwk: but for v0/v3, 16-bit can save you some space
22:48 mwk: and space is at a premium for those
22:49 karolherbst: well
22:49 karolherbst: we know the code we need to compile
22:49 karolherbst: that's a big win
22:49 mwk: karolherbst: C with no pointers is no C :p
22:51 karolherbst: mhh one think I was thinking about since last time: do we actually _need_ those process things? Or is it just made up IPC to somehow seperate space?
22:51 mwk: I'm still not sure why on earth would process things be something that the compiler cares about
22:51 karolherbst: because in the end falcons don't do much when there is no interrupt or idle loops
22:51 karolherbst: mwk: well the fuc code gets assembled into one big binary
22:52 mwk: sure, and?
22:52 karolherbst: I am just curious why there are those processes at all?
22:52 mwk: huh? that's not the compiler's problem
22:52 karolherbst: right, so this is all software only
22:53 karolherbst: there is no hw specific need to have those
22:53 mwk: of course
22:53 mwk: you could have everything in a big event loop
22:53 karolherbst: okay
22:53 mwk: or you could have fully-preemptible pseudo-OS like nvidia PMU
22:53 karolherbst: well
22:53 mwk: it's still not the compiler's problem and I'm not going to think about it at all
22:53 karolherbst: k
22:54 karolherbst: full-preemptible sounds like fun though
22:55 mwk: feel free to write a context switcher
22:55 mwk: some assembly required, though
22:55 mwk: like for every OS on the planet
22:56 karolherbst: well I think for now using interrupts is enough :)
22:56 karolherbst: and those funny alarms
22:57 mwk: well, here goes
22:57 mwk: falcon3-unknown-unknown
22:57 mwk: makes it sound like a bastard child...
22:58 karolherbst: :D
22:58 mwk: or should it be falcon3-nvidia-unknown
22:58 karolherbst: what is the third thing again?
22:58 mwk: OS
22:58 karolherbst: ohh
22:58 mwk: ... falcon3-nvidia-nouveau?
22:58 karolherbst: yeah
22:58 karolherbst: makes somewhat sense
22:58 mwk: it
22:59 mwk: it's the second part that nobody cares about
22:59 karolherbst: like the last part is important on the falcons
22:59 mwk: the last part *can* be important
22:59 karolherbst: I know
22:59 mwk: the second part is essentially meaningless
23:00 mwk: you know
23:00 karolherbst: ohh fun, lets add this as a dependeny for building the nouveau module :D
23:01 mwk: once nvidia sees the light and uses our Falcon compiler, upstreamed to LLVM proper
23:01 mwk: they're going to be called falconv3-nvidia-nvrm
23:01 mwk: and the last part will choose our ultra-new optimized calling convention, vs. their old calling convention!
23:02 mwk: like x86_64 on windows vs linux :p
23:02 karolherbst: mhh
23:02 karolherbst: but in the end it doesn't really matter though
23:02 karolherbst: it's not like we really care about how the falcons does function calls or something
23:02 mwk: yeah, I don't care either
23:03 karolherbst: we still have just crappy ways to talk with them :D
23:03 mwk: it's just that... the triples sound all serious and all
23:03 karolherbst: but implementing all that in C is nice :)
23:03 karolherbst: right
23:04 karolherbst: mwk: do you know what? When we have all the falcon stuff written in C.. we could like run the code on the host to test it...
23:05 karolherbst: kind of
23:05 mwk: hehe, falcon unittests
23:06 karolherbst: yeah, and I could like test the dynamic reclocking code also on the host
23:07 karolherbst: the more I think about it, the more I want to have that
23:09 karolherbst: mwk: never looked into how in llvm you define/add a new target
23:09 karolherbst: mwk: is it mostly translation form a pseudo ISA to the real one and declering stuff or is there more?
23:12 mwk: karolherbst: it's complex
23:12 mwk: first and formost I have to write a mapping from so-called ISel DAG to MachineInstruction
23:13 mwk: then I need some way to transform MachineInstructions to some output
23:13 mwk: ie. assembly or binary
23:14 mwk: the default is assembly, but binary is not much harder
23:14 mwk: and if I do that, I can throw in an asm parser, and get an assembler for free
23:15 mwk: that gives me a LLVM IR -> Falcon compiler
23:15 mwk: then I have to write a simple target description for clang, and I get a C/C++ compiler
23:15 karolherbst: I am looking through the documentation page currently, sounds a bit much indeed
23:15 mwk: also, llvm has a linker, lld
23:16 karolherbst: but we will only support static linking anyway?
23:16 mwk: which also reuses the work on binary output
23:16 mwk: of course
23:16 mwk: but we'll have a problem with the limitted RAM space on Falcons
23:17 mwk: so we'll have to make some custom mechanism to do overlays
23:17 mwk: and/or paging
23:17 karolherbst: how much RAM space do we have?
23:17 karolherbst: and how big is the stack for the registers?
23:17 mwk: but I think that's mostly orthogonal to the compiler
23:18 mwk: that depends on the Falcon
23:18 mwk: some falcons have as little as 2.5kiB of code and 2.5kiB of data
23:19 mwk: PMU has more, eg. 24kiB of code
23:19 karolherbst: ahh okay
23:19 karolherbst: could those be subtargets?
23:19 mwk: uh, why?
23:19 mwk: we don't care about code RAM size in the compiler
23:19 mwk: we care about the ISA subset
23:19 karolherbst: mhh, maybe we could just limit ourselv a bit and doesn't page/whatever
23:20 karolherbst: and then we say: there is just the data/code space and that's what we have
23:21 karolherbst: and if you want to have some memory, use global/function static stuff and do the things
23:22 mwk: uh?
23:23 mwk: the RAM used for stack and globals/statics is exactly the same
23:23 karolherbst: mhh, sometimes I still think too high-level...
23:23 mwk: matter of fact, statics are worse
23:23 mwk: stack only takes up space if the function is currently executing, global takes up space always
23:24 karolherbst: right
23:24 karolherbst: yeah, I was being a bit stupid
23:24 karolherbst: have to remember the time I was developing on an ARM dev board, without an OS
23:24 karolherbst: but that's like 3 years away now? :/