00:01calim: yeah flattening ignores the code because there is no conditional
00:02calim: can you just safely delete the joinat/join yourself ?
00:02karolherbst: I was already thinking about this, but I wanted to let flattening do this
00:03karolherbst: but if an empty branch is also empty when it just contains bra/join/joinats
00:03karolherbst: then I could remove those in empty bbs
00:03karolherbst: but mhh
00:03karolherbst: then I have to lookup where the joinat points to
00:03karolherbst: because the join might be usefull
00:07calim: that's why the flattening pass only deals with simple if-else-endif cases
00:09calim: either way you have to add extra code to check if the situation allows for removal of the joinat/join
00:09karolherbst: calim: when the first instruction of a BB is a join and the last instruction of the "previous" BB is the joinat, those can be removed, right?
00:10karolherbst: or is there a more realxed rule I can use?
00:11imirkin_: if there's a direct path from BB A -> BB B, you can nuke the joinat/join pair. it's only if there are conditional branches in between.
00:12karolherbst: ahh okay
00:12calim: yes ... joinat BB:n, join == bra BB:n
00:12karolherbst: well I nuke empty branches in between away, so I should detect those
00:13karolherbst: imirkin_: so if there is an edge connection joinat.bb and join.bb directly, those can be nuked?
00:13calim: with BB:n from whichever the last joinat you encountered was
00:13karolherbst: *connecting
00:13imirkin_: karolherbst: pretty much yeah
00:13calim: the join stuff is only useful for divergent branches
00:13karolherbst: okay
00:13karolherbst: that should be easy then
00:19karolherbst: imirkin_: wouldn't that fit more in something like DCE? or some other pass, because nuking useless joinat/join away makes sense anyway. Or are those only produced when actually needed?
00:21imirkin_: those tend to only be produced when needed
00:21imirkin_: if you're going around removing blocks, that can mess things up
00:30karolherbst: k
00:46rardiol: my laptop just sort of hanged. after rebooting, I have this in the logs: http://nixpaste.lbr.uno/raw/qR78hK41 . looks like a nouveau problem?
00:46imirkin_: yeah. it's an unsolved, undiagnosed problem
00:47imirkin_: May 06 00:31:17 kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 3 [X[781]] get 002000c8a8 put 002000c8b4 ib_get 000002e3 ib_put 0000036d state 80004610 (err: INVALID_CMD) push 00400040
00:47imirkin_: that happens sometimes
00:47imirkin_: no idea why or how to fix it.
00:50rardiol: imirkin_: me even less, but thanks for the info. Can I do something about it?
01:11imirkin: well, this particular issue only happen on nv50-era gpu's... but beyond that i don't know much about it
01:14rardiol: I just checked, I seem to have gotten it quite a few times already. Don't remember if I just rebooted and didn't bother checking the cause or what.
01:15rardiol: There's some errors involving DMA_PUSHER, but the surroinding nouveau errors are different
01:15imirkin: sometimes those errors kill, other times they're harmless
01:16rardiol: ah, that explains why I didn't see it.
01:16imirkin: the surrounding errors tend to be a direct result of that pusher error
01:24rardiol: but I get some errors before DMA_PUSHER
01:29rardiol: would extra loggin give some useful info? or just the logs I have now?
01:31imirkin: sorry, no clue
08:59pq: I got an email for mmiotrace's printing on the PCIDEV lines, that it may need some fixing. Who wants to get CC'd with my reply?
09:00pq: I already have karolherbst. How about mwk? skeggsb?
09:00pq: who's been working with mmiotrace, more importantly, written tools to parse the traces?
10:05mwk: pq: give me that CC
10:06pq: mwk, cool, what's your email?
10:13mwk: koriakin@0x04.net
10:16pq: mwk, thanks, sent. Decided to cc Ben too.
10:36karolherbst: pq: already replied without looking here first :D
10:36karolherbst: pq: well I think it would be nice to have mmiotrace working on ppc too though
10:37karolherbst: and I think this would be the only archs we care about now? x86 and ppc.
10:37karolherbst: No idea if nvidia gpus might work on others
10:39karolherbst: pq: maybe it might make sense to add it which projects heavily rely on that tool, just so that we get notified faster about changes, but... changes never really happen there :/
10:40karolherbst: allthough it is a great tool
10:40RSpliet: karolherbst: if you want to get cracking on Tegra, I'd go for ARM and ARM64 too :-P
10:41karolherbst: RSpliet: were is the closed nvidia driver?
10:41karolherbst: or is there one?
10:41RSpliet: .. mostly in user-space :-D
10:41karolherbst: right ... :D
10:42karolherbst: anyway, getting tegra documentation shouldn't be the problem
10:42karolherbst: and if so, we force gnurou to deal with the issue
10:42RSpliet: well... depends on what you want to know
10:42RSpliet: but hmm... I don't think envytools works with platform devices, does it?
10:42karolherbst: mhh
10:42karolherbst: no idea
10:43karolherbst: are those mcp79 ethernet cards normal PCI devices?
10:43karolherbst: "cards"
10:44RSpliet: yes
10:44karolherbst: yeah, then I have no clue
10:44RSpliet: I think I tried it on my Uni's K1 a while ago, didn't work
10:44karolherbst: could be easy to fix though
10:45karolherbst: in either case, that doesn't matter much for mmiotrace anywy
10:46karolherbst: well I think it would really make sense to get it working for ppc though
10:46karolherbst: maybe not so much today, but maybe ppc desktops will be a thing again
11:24dcomp: So I've been comparing my mmiotrace with the other gm108
11:25dcomp: PSTRAPS.STRAPS0_PRIMARY RAMCFG is 0x2 on mine and 0x4 on the other
11:51karolherbst: dcomp: if you want to find out what goes wrong with memory reclocking on your ddr3 gm108 you should take a look at the SEQ scripts
11:52RSpliet: ramcfg is a relatively arbitrary number that only has meaning within the context of your VBIOS
11:52RSpliet: I'm guessing the two VBIOSes differ as well
11:57karolherbst: RSpliet: the problem is, that this will be a really minor difference between kepler and maxwell :/
11:57karolherbst: RSpliet: maybe even something stupid as the issue with gddr5 earlier
11:57karolherbst: I couldn't find any obvious difference between the kepler and maxwell ddr3 traces
11:57karolherbst: just the random card varriance
11:59karolherbst: dcomp: what might help is tracing nvidia controlled (use marks to mark your reclocking) and then trace nouveau and reclock too
12:00karolherbst: and maybe those generated SEQ scripts in both traces differ in a fundamantal point
12:00karolherbst: usually they should be quite close
12:11pq: karolherbst, btw. maybe you could propose adding the nouveau@ list into MAINTAINERS for mmiotrace or something like that, so people would know to CC it.
12:11karolherbst: pq: that would make actually sense, right
12:11pq: I don't really know what the kernel practice is, though
12:13karolherbst: well if you are in MAINTAINERS you have to be aware that users come to you and complain about broken stuff :p
12:13karolherbst: I think
12:14pq: I mean, is there a place for a mailing list in MAINTAINERS or is it just people
12:14karolherbst: everything goes
12:14karolherbst: L: for lists
12:14karolherbst: M: for people I think
12:14pq: cool
12:14karolherbst: or the maintainer
12:15karolherbst: yeah
12:15karolherbst: multiple L: entries seem to be fine
12:15karolherbst: thought I think at least one M: might be needed
12:15karolherbst: Orphan ones have no M sometimes
12:16pq: isn't mmiotrace orphaned? :-)
12:16karolherbst: nope
12:16karolherbst: issues get fixed as you see :D
12:17karolherbst: maybe officially orphaned
12:17karolherbst: but we do care a lot about it
12:17pq: sure, but as in, is anyone really a maintainer, to want to review the patches touching it?
12:18karolherbst: I think we would gladly do it
12:18karolherbst: it would be really bad if we do a kernel update and out of the sudden mmiotrace doesn't work anymore
12:18karolherbst: so we want to at least test the patches
12:19pq: indeed, up to you :-) I won't have time for that.
12:19karolherbst: I think adding the nouveau ML makes sense
12:19karolherbst: then somebody will maybe pick it up
12:27karolherbst: pq: I think I will write Ingo and Steven about this, because they seems to be mostly involved with the tracers in general
12:30pq: a good idea
14:10dcomp: anyone got an mmiotrace of gm107
14:12RSpliet: karolherbst: that could simply mean that the Kepler code is incomplete - but only on a feature that wasn't used in practice
14:13RSpliet: (like "DLL")
14:13karolherbst: yeah, most likely
14:26karolherbst: I think I am done with the messy pass by the way :)
14:42mupuf: so, the jetson regressed very badly
14:42mupuf: now, it seems like a fence never gets signaled
14:43mupuf: the funny thing is, when I the same version of mesa and the kernel as a month back, it still does not work
14:43mupuf: let's see if there were libdrm patches
14:56karolherbst: yay
14:56karolherbst: total instructions in shared programs : 2604639 -> 2583239 (-0.82%) and total gprs used in shared programs : 357515 -> 356476 (-0.29%)
14:56karolherbst: and one hurt
14:56karolherbst: ...
14:57karolherbst: helped: 574 gpr 1595 inst/bytes
14:57karolherbst: and no shader affected in the official shader-db ...
15:10karolherbst: ohh mostly saints row benefits from this
15:11karolherbst: wow
15:11karolherbst: total instructions in shared programs : 329818 -> 313198 (-5.04%) and total gprs used in shared programs : 33615 -> 32635 (-2.92%) in saints row 3/4
15:11mupuf: that is ... sizeable
15:11karolherbst: yeah
15:11mupuf: especially since the join instructions may be extra costly
15:11karolherbst: I don't remove joins
15:12karolherbst: this is just useless crap
15:12karolherbst: like
15:12karolherbst: if (condition) { //empty } else { //empty }
15:12karolherbst: nothing else
15:12karolherbst: and the condition gets DCEed away after my pass ran
15:12mupuf: I see, they use a lot of macros I guess :D
15:12karolherbst: and sometimes it nukes 10% of a shader
15:12mupuf: DCE?
15:13karolherbst: dead-code-elimination
15:13mupuf: ack
15:17mupuf: #define NOUVEAU_FENCE_MAX_SPINS (1 << 31) <-- that sounds overly long to me, but also does not tell anything about the actual time it takes. I would say something like a minute would be nice
15:17mupuf: on the nvea, it takes something like 5 minutes to expire
15:18imirkin_: yeah, it's an incredibly long time
15:18imirkin_: and it's never supposed to ever ever happen. ever.
15:18imirkin_: ever. never.
15:18imirkin_: never ever.
15:18mupuf: yep, I know, I am trying to figure out why this is happening
15:19imirkin_: that means we messed something up while attempting to write the fence
15:19imirkin_: or the gpu is hung
15:20mupuf: imirkin_: the kernel would say something, right?
15:20mupuf: and FYI, running wflinfo is enough ... 50% of the time
15:20imirkin_: should yea
15:20imirkin_: lol
15:21imirkin_: well THAT could be something a lot more harmless
15:21imirkin_: since wflinfo probably doesn't do a lot of rendering
15:21mupuf: right
15:21imirkin_: so that could be us doing something dumb
15:21mupuf: piglit tests are all hitting it though
15:21imirkin_: ah ok
15:21mupuf: I am using GBM though
15:22mupuf: wait a sec, maybe last time, I did run something before runnign piglit
15:22mupuf: like, a KMS demo or something
15:22imirkin_: one reason this could happen is if the gpu's writes aren't visible on the cpu
15:22imirkin_: due to some cache screwup
15:22imirkin_: although i thought gnurou fixed that
15:22mupuf: yeah, but I would not expect that to happen on a 4.5 kernel
15:23imirkin_: not a kernel thing
15:23imirkin_: it has to do with how the fence bo is created
15:23mupuf: well, running kmscon was enough to hang the machine
15:23imirkin_: sure, if fences never updated pretty much everything gets sunk
15:24mupuf: yeah, but why is my ssh access going down then ;)
15:25imirkin_: my tk1 hangs after a few minutes usually
15:25imirkin_: (without even loading nouveau)
15:25mupuf: mine has been very stable
15:26mupuf: until yesterday
16:03drathir: guys 110'C for passive gpu is critical?
16:04drathir:wonder if that have any thermal emergency shutdown when too high temp...
16:05drathir: G84
16:07imirkin_: yes, that's high
16:07mupuf: drathir: yes, it is close to the absolute max which is supposed to be 132°C in most G84
16:08mupuf: clean your damn GPU and make sure there is an airflow in your machine ;)
16:20drathir: mupuf: imirkin_ thanks a lot... that mean or gpu damaged or no thermo pasta onboard i guess...
16:21drathir: mupuf: its passive clean one.. mb have 46'C
16:21karolherbst: drathir: listen to what mupuf said
16:21karolherbst: drathir: air flow.. very important
16:23drathir:guess need attach some hand maded fan i guess... or there is really no pasta left....
16:25karolherbst: drathir: well maybe you can plug in your GPU in a different slot?
16:25karolherbst: drathir: usually a fan providing airflow through the case should be enough
16:26drathir: karolherbst: honestly even not sure if that card is fully functional...
16:28drathir: karolherbst: its card survived the go to trash scenario...
17:00karolherbst: anybody want ot maintain the mmiotracer?
17:01karolherbst: I could, but I have less knowledge about it than pq and pq doesn't want to :p
17:02pq: I've replaced all my kernel knowledge with Wayland and Weston :-)
17:02karolherbst: :D
17:02karolherbst: k
17:33dcomp: is the mmiotracer meant to trace the current instruction pointer?
17:34imirkin_: karolherbst: did you get this when building the new apitrace +qt5 ? http://hastebin.com/juzecuhuvo.php
17:36drathir: karolherbst: mupuf few min fan equipped 42'C ;p
17:36mupuf: karolherbst: you know better than anyone here
17:36karolherbst: imirkin_: qt5.6?
17:37drathir: but fresh after restart for sure go up ;p
17:38karolherbst: mupuf: meh, and I think I know like 20% :/
17:38mupuf: drathir: well, now it is really cold for a G84
17:38karolherbst: dcomp: well I have a silent coold mcp79 in my mac mini and it usually reaches 90°C in idle
17:40dcomp: drathir: ^
17:41drathir: mupuf: for sure should go up, its fresh booted up...
17:47drathir: karolherbst: mupuf thats fan is from amd gpu fabric cooling system... performance shouldnt be so bad in theory...
17:48drathir: amd gpu/amd cpu*
17:52imirkin_: karolherbst: i think 5.5
17:53imirkin_: karolherbst: 5.5.1
18:03karolherbst: imirkin_: well with current master I don't get any build failures
18:03karolherbst: imirkin_: ohh you mean the ebuild?
18:08drathir: karolherbst: mupuf thanks a lot for help...
18:23imirkin_: karolherbst: yes
18:23imirkin_: karolherbst: the 7.1 ebuild
18:30karolherbst: imirkin_: compiled just fine here
18:30imirkin_: hrmph
18:30imirkin_: against qt 5.5?
18:30imirkin_: or 5.6?
18:31karolherbst: 5.6
18:34imirkin_: hrm... looks like it's going down the QT_OPENGL_ES_2 path
18:35karolherbst: imirkin_: right, you shouldn't build qt5 with egl because of stupid reasons
18:35imirkin_: ah.
18:36karolherbst: at least in older version turning on EGL meant turning of OpenGL
18:36karolherbst: and EGL meant GLES
18:36imirkin_: got it.
18:37karolherbst: mhh maybe egl was fine in qtgui,
18:37karolherbst: have to check
18:37karolherbst: imirkin_: the qtopengl ebuild does this: "-opengl $(usex gles2 es2 desktop)"
18:39karolherbst: maybe I had compile issues somewhere with qtgui[egl] and qtopengl[-gles2]
18:39karolherbst: but there was something funky
18:41imirkin_: heh
18:41karolherbst: well I am 100% sure that qtopengl[gles2] is bad :D
18:42imirkin_: yeah
18:42imirkin_: i'm doing -gles2 on both qtgui and qtopengl
18:42karolherbst: right
18:42karolherbst: maybe the EGL thing is fixed by now
18:42imirkin_: and hopefully that'll fix it all right up
18:42imirkin_: there is no egl on qtopengl
18:42karolherbst: qtgui
18:42imirkin_: ah, i have it on for that one
18:42karolherbst: yeah, but it should be fine I think
18:43karolherbst: maybe some bad deps
18:43karolherbst: I know there was a reason I disabled it
18:43karolherbst: can't remember anymore
18:43imirkin_: wtvr
18:47karolherbst: imirkin_: what parameters are the best to just check opengl rendering stuff in piglit?
18:48imirkin_: not sure what you mean
18:49karolherbst: ./piglit run -p glx gpu
18:49karolherbst: I think
18:49karolherbst: just to test changes in codegen
18:49imirkin_: https://people.freedesktop.org/~imirkin/
18:49karolherbst: ahh
18:49karolherbst: thanks
18:50karolherbst: I test without -1 :)
18:50imirkin_: you like to live dangerously i guess
18:51karolherbst: does reclocking speed it up somewhat?
18:53karolherbst: imirkin_: do you know if piglit allows you to add a regular expression to filter out dmesg messages?
18:53karolherbst: so that piglit doesn't think it has anything to do with the run
18:53imirkin_: don't think so
18:53imirkin_: at least i'm not aware of it
18:53karolherbst: okay
18:53karolherbst: because linux is stupid and always writes into dmesg if my wifi changes the channel..
18:54karolherbst: or the channel width
18:54karolherbst: ...
18:55imirkin_: i think those messages are ignored
18:55imirkin_: info messages are, i think
18:56karolherbst: okay
18:57karolherbst: what is streaming-texture-leak for?
19:23RSpliet: <rant>
19:24karolherbst: imirkin_: how fast did parallel piglit runs hang the GPU for you?
19:24RSpliet: on this entire planet there is only *ONE* datasheet mentioning the lo and hi CL/CWL selection bit in EMRS2
19:25karolherbst: :D
19:25RSpliet: guess which ones are on my one GDDR3 card... ugh
19:25RSpliet: produced by a fine German company, bankrupt since 2009
19:25karolherbst: :D
19:25RSpliet: they deserved it...
19:25RSpliet: </rant>
19:26karolherbst: RSpliet: but you mean a "physical" datasheet? because otherwise "one" makes hardly sense :p I am sure nvidia also has some
19:42RSpliet: ah there we go... I should archive all these datasheets
19:43imirkin_: karolherbst: i dunno... kinda random. allegedly ben says it doesn't hang his gpu's much anymore
19:44imirkin_: RSpliet: which gpu?
19:45RSpliet: imirkin_: NVA3/2
19:46RSpliet: although my NVA0 doesn't have said bit
19:46RSpliet: (just pulled that monster apart)
19:51imirkin_: heh
20:12karolherbst: running saints row 4 with the gallium_hud was really interessting though. I managed to half the executed instructions without affecting the fps at all...
20:36karolherbst: imirkin_: if you want to take a look at this (currently running piglit, but it looks good so far): https://github.com/karolherbst/mesa/commit/1b2c6f0fab00fc34a79c079f6d6f7cf509edd48c
20:37imirkin_: karolherbst: i think you want visit(BasicBlock *bb)
20:37karolherbst: yeah I saw it already
20:38karolherbst: I needed this earlier with a slightly worse design
20:38imirkin_: :)
20:38karolherbst: but rebindFlowInstructions is really expensive
20:38karolherbst: I think I only want to run it once
20:38karolherbst: and splitting into two passes doesn't make sense
20:39imirkin_: i think there's a wildly simpler way to do it
20:39imirkin_: i'll have a look later
20:39karolherbst: I hope there is
20:39karolherbst: because all my simplier approaches failed somewhere
20:40karolherbst: okay, thanks
20:43hakzsam_: karolherbst, maybe you can have a look at the branch perf counters to see if your pass improves efficiency :)
20:44karolherbst: hakzsam_: nouveau has bigger issues with saints row :D
20:44karolherbst: hakzsam_: but yeah, I planned to
20:44hakzsam_: like what?
20:44karolherbst: hakzsam_: cutting instruction in half in GALLIUM_HUD improved performance by... <5%
20:45karolherbst: *count
20:45karolherbst: so I just disabled a bunch of graphical settings to reduce the invoked instructions, but the perf didn't change much
20:45hakzsam_: yeah okay
20:46karolherbst: hakzsam_: but my pass for saints row 3/4 shaders: total instructions in shared programs : 329818 -> 313198 (-5.04%) and total gprs used in shared programs : 33615 -> 32635 (-2.92%)
20:46hakzsam_: that removes unused instructions in shaders, cool
20:46karolherbst: yeah
20:46karolherbst: this was the main goal :D
20:46hakzsam_: sure I know
20:47karolherbst: sadly no change in public shader-db
20:47karolherbst: and usually only eon based games are affected
20:48hakzsam_: that's a good start for eon games which are really badly supported with nouveau anyway
20:48karolherbst: ohh wait
20:48karolherbst: totally untrue
20:48karolherbst: antichamber, shadow_warrior, borderlands...
20:48karolherbst: even heaven and valley
20:48karolherbst: and tomb raider
20:48hakzsam_: all of them are from eon?
20:48karolherbst: nope
20:48karolherbst: payday 2
20:49karolherbst: these are affected games by my pass
20:49karolherbst: ohh wait
20:49karolherbst: I compared wrong files...
20:49hakzsam_: :)
20:49karolherbst: ohh well
20:49karolherbst: but I now that valley and heaven have a few affected shaders
20:49karolherbst: and shadow warrior too
20:50karolherbst: but not much
20:50hakzsam_: anyway, removing unused flow instructions like branches are useful even if this doesn't really improve performance
20:50karolherbst: anyway, I hoped it would increase the perf in eon based games, but it somewhat didn't noticeable
20:50karolherbst: hakzsam_: nope
20:50karolherbst: hakzsam_: my pass didn't do this
20:50karolherbst: flattening already removed those
20:51karolherbst: it just takes care of this before RA is done
20:51karolherbst: and throws a DCE in too
20:51hakzsam_: your pass removes empty branches, right?
20:51karolherbst: hakzsam_: before RA, right
20:51karolherbst: hakzsam_: example
20:51hakzsam_: yeah, that what I said
20:52karolherbst: hakzsam_: if (condition) { //nothing } else { // nothhing }
20:52karolherbst: condition can be a result of many instructions
20:52karolherbst: so
20:52karolherbst: flattening removed this if/else/endif away
20:52karolherbst: and then condition is dead code
20:52karolherbst: but it is still computed
20:52karolherbst: so we could have a simple PostRA-DCE pass dealing with this
20:53karolherbst: (this was the way how I found those cases)
20:53hakzsam_: oh okay
20:53karolherbst: but doing this PostRA, doesn't help much with GPR count ;)
20:53hakzsam_: sure, won't help a lot :)
20:53karolherbst: and some shader are like 20% smaller with this
20:54karolherbst: so yeah, the pass eliminates empty branches, but it does so, that we can DCE conditons of conditional branches away
20:55hakzsam_: makes sense
20:55karolherbst: had a lot of fun with it
20:55karolherbst: :D
20:56karolherbst: like joinats 0x3fffff ...
20:56karolherbst: in the generated binary
20:56karolherbst: because joinat BB:5 is broken when BB:5 isn't reachable anymore
20:58hakzsam_: yeah, you also need to remove those joinats when they don't rely to a reachable BB
20:58karolherbst: yeah, I already do that
20:59karolherbst: there are some shaders which produced like 150BBs and had 200 instructions
20:59karolherbst: and most of those BBs got joinat/join pairs
20:59hakzsam_: yeah I see
21:00karolherbst: hakzsam_: and I have to create new edges if I remove empty BBs in between and have to change the target of the flow instructions :)
21:00hakzsam_: right
21:38karolherbst: huh
21:38karolherbst: 2 piglit fails less with my pass
21:38imirkin_: which ones
21:38imirkin_: (some piglits are flakey)
21:39karolherbst: creating a report
21:39karolherbst: glean.fbo
21:40karolherbst: and ext_framebuffer_multisample.no-color 8 depth single
21:40karolherbst: ohh wait
21:40karolherbst: the latter is a dmesg-fail now
21:40karolherbst: but that's my fault
21:40karolherbst: so, no regression through my pass :)
21:41karolherbst: yet
21:41imirkin_: some of the ms depth ones are flakey
21:57karolherbst: okay
22:32mwk: well
22:33mwk: I just pulled out a few beers, cloned llvm git, and did mkdir lib/Target/Falcon
22:33mwk: let's see what happens
22:43airlied: mwk: drink more beers, write code, sleep, wakeup, forget how you wrote the code or how it works
22:44mwk: hmm
22:44mwk: airlied: sounds like a plan
22:44mwk: oh yay, I get to pick a target triple
22:44mwk: falcon-unknown-unknown
22:45karolherbst: mwk: but it would be really awesome to have a C compiler for falcons :)
22:45mwk: or should I have separate targets for the various versions of falcon
22:45karolherbst: and then we rewrite everything in C and the binaries are like 50% smaller :D
22:45mwk: I mean, it's not like you can actually mix v3 with v5 code
22:45karolherbst: mwk: right
22:45karolherbst: the ISA is different
22:46mwk: I'm thinking three targets
22:46mwk: falcon0, falcon3, falcon5
22:46karolherbst: mwk: do you know what? maybe we should also add test for every instruction
22:46karolherbst: mwk: I doubt it will work like that
22:46mwk: v4 is really v3 with some additions
22:46mwk: but then
22:46mwk: you know, I've been thinking *a lot* about llvm for Falcon in the last weeks
22:46karolherbst: maybe start with v3 first
22:46mwk: and I'm thinking we could go with 16-bit void* for v3, 32-bit for v4
22:47karolherbst: because that makes the most sense
22:47imirkin_: that won't be confusing at all
22:47imirkin_: why not use sign-magnitude and 38-bit words
22:47karolherbst: :D
22:47imirkin_: er, 36 i guess
22:47karolherbst: mwk: can we have... no pointers? :D
22:47mwk: imirkin_: well, if you depend on void * size, you deserve whatever happens...
22:48imirkin_: eh... people depend on it being 4 or 8, i think
22:48mwk: the natural pointer size for Falcon is 32-bit for v4 and up, no questions
22:48imirkin_: although i guess regular ol' 16-bit code had it as 16 bits...
22:48mwk: but for v0/v3, 16-bit can save you some space
22:48mwk: and space is at a premium for those
22:49karolherbst: well
22:49karolherbst: we know the code we need to compile
22:49karolherbst: that's a big win
22:49mwk: karolherbst: C with no pointers is no C :p
22:51karolherbst: mhh one think I was thinking about since last time: do we actually _need_ those process things? Or is it just made up IPC to somehow seperate space?
22:51mwk: I'm still not sure why on earth would process things be something that the compiler cares about
22:51karolherbst: because in the end falcons don't do much when there is no interrupt or idle loops
22:51karolherbst: mwk: well the fuc code gets assembled into one big binary
22:52mwk: sure, and?
22:52karolherbst: I am just curious why there are those processes at all?
22:52mwk: huh? that's not the compiler's problem
22:52karolherbst: right, so this is all software only
22:53karolherbst: there is no hw specific need to have those
22:53mwk: of course
22:53mwk: you could have everything in a big event loop
22:53karolherbst: okay
22:53mwk: or you could have fully-preemptible pseudo-OS like nvidia PMU
22:53karolherbst: well
22:53mwk: it's still not the compiler's problem and I'm not going to think about it at all
22:53karolherbst: k
22:54karolherbst: full-preemptible sounds like fun though
22:55mwk: feel free to write a context switcher
22:55mwk: some assembly required, though
22:55mwk: like for every OS on the planet
22:56karolherbst: well I think for now using interrupts is enough :)
22:56karolherbst: and those funny alarms
22:57mwk: well, here goes
22:57mwk: falcon3-unknown-unknown
22:57mwk: makes it sound like a bastard child...
22:58karolherbst: :D
22:58mwk: or should it be falcon3-nvidia-unknown
22:58karolherbst: what is the third thing again?
22:58mwk: OS
22:58karolherbst: ohh
22:58mwk: ... falcon3-nvidia-nouveau?
22:58karolherbst: yeah
22:58karolherbst: makes somewhat sense
22:58mwk: it
22:59mwk: it's the second part that nobody cares about
22:59karolherbst: like the last part is important on the falcons
22:59mwk: the last part *can* be important
22:59karolherbst: I know
22:59mwk: the second part is essentially meaningless
23:00mwk: you know
23:00karolherbst: ohh fun, lets add this as a dependeny for building the nouveau module :D
23:01mwk: once nvidia sees the light and uses our Falcon compiler, upstreamed to LLVM proper
23:01mwk: they're going to be called falconv3-nvidia-nvrm
23:01mwk: and the last part will choose our ultra-new optimized calling convention, vs. their old calling convention!
23:02mwk: like x86_64 on windows vs linux :p
23:02karolherbst: mhh
23:02karolherbst: but in the end it doesn't really matter though
23:02karolherbst: it's not like we really care about how the falcons does function calls or something
23:02mwk: yeah, I don't care either
23:03karolherbst: we still have just crappy ways to talk with them :D
23:03mwk: it's just that... the triples sound all serious and all
23:03karolherbst: but implementing all that in C is nice :)
23:03karolherbst: right
23:04karolherbst: mwk: do you know what? When we have all the falcon stuff written in C.. we could like run the code on the host to test it...
23:05karolherbst: kind of
23:05mwk: hehe, falcon unittests
23:06karolherbst: yeah, and I could like test the dynamic reclocking code also on the host
23:07karolherbst: the more I think about it, the more I want to have that
23:09karolherbst: mwk: never looked into how in llvm you define/add a new target
23:09karolherbst: mwk: is it mostly translation form a pseudo ISA to the real one and declering stuff or is there more?
23:12mwk: karolherbst: it's complex
23:12mwk: first and formost I have to write a mapping from so-called ISel DAG to MachineInstruction
23:13mwk: then I need some way to transform MachineInstructions to some output
23:13mwk: ie. assembly or binary
23:14mwk: the default is assembly, but binary is not much harder
23:14mwk: and if I do that, I can throw in an asm parser, and get an assembler for free
23:15mwk: that gives me a LLVM IR -> Falcon compiler
23:15mwk: then I have to write a simple target description for clang, and I get a C/C++ compiler
23:15karolherbst: I am looking through the documentation page currently, sounds a bit much indeed
23:15mwk: also, llvm has a linker, lld
23:16karolherbst: but we will only support static linking anyway?
23:16mwk: which also reuses the work on binary output
23:16mwk: of course
23:16mwk: but we'll have a problem with the limitted RAM space on Falcons
23:17mwk: so we'll have to make some custom mechanism to do overlays
23:17mwk: and/or paging
23:17karolherbst: how much RAM space do we have?
23:17karolherbst: and how big is the stack for the registers?
23:17mwk: but I think that's mostly orthogonal to the compiler
23:18mwk: that depends on the Falcon
23:18mwk: some falcons have as little as 2.5kiB of code and 2.5kiB of data
23:19mwk: PMU has more, eg. 24kiB of code
23:19karolherbst: ahh okay
23:19karolherbst: could those be subtargets?
23:19mwk: uh, why?
23:19mwk: we don't care about code RAM size in the compiler
23:19mwk: we care about the ISA subset
23:19karolherbst: mhh, maybe we could just limit ourselv a bit and doesn't page/whatever
23:20karolherbst: and then we say: there is just the data/code space and that's what we have
23:21karolherbst: and if you want to have some memory, use global/function static stuff and do the things
23:22mwk: uh?
23:23mwk: the RAM used for stack and globals/statics is exactly the same
23:23karolherbst: mhh, sometimes I still think too high-level...
23:23mwk: matter of fact, statics are worse
23:23mwk: stack only takes up space if the function is currently executing, global takes up space always
23:24karolherbst: right
23:24karolherbst: yeah, I was being a bit stupid
23:24karolherbst: have to remember the time I was developing on an ARM dev board, without an OS
23:24karolherbst: but that's like 3 years away now? :/