01:38karolherbst: imirkin: what would be a good way to determine the real throughput values of the specific instructions?
01:58karolherbst: mhh okay, without PRIME I get a 1% speed up in glxgears from 2.5 to 8.0 pcie speed
02:05mlankhorst: glxgears is just a benchmark for context switching..
02:07karolherbst: pcie speed still seems to effect the fps
02:08karolherbst: I think it generally a good idea, that commands to the gpu are ariving faster, which reduces wait/stalls/whatever
02:08karolherbst: eve when the perf gain is pretty small for most use cases
02:09karolherbst: now I even get around 3% more fps
02:09karolherbst: seems to have a bigger effect, when scheduling is better
02:18glennk: karolherbst, possibly some hints in https://devtalk.nvidia.com/default/topic/390366/instruction-latency/
02:20karolherbst: glennk: thanks, but mhh this is a bit old, things might be differnt on my gk106
02:24karolherbst: glennk: what do you think about this? http://www.stuffedcow.net/research/cudabmk
02:27glennk: mostly gt200 info isn't it?
02:28karolherbst: glennk: http://lpgpu.org/wp/wp-content/uploads/2013/05/poster_andresch_acaces2014.pdf
02:28karolherbst: this looks nice
02:29glennk: yeah the first page has the info you need
02:29karolherbst: but I still don't really get what the latency and what the throughput is and how I should schedule best
02:29glennk: i wouldn't put _too_much_ detail into the scheduler, makes it unstable
02:30karolherbst: mhh I just read the throughput values out
02:30karolherbst: ohh mhh
02:31glennk: int div on fermi, eek
02:32karolherbst: kepler seems to be the best anyway :P
02:32karolherbst: exceot floating point div
02:33karolherbst: ohh floating point is a bit bad overall
02:33karolherbst: no, this doesn't help me much sadly :/
02:33karolherbst: mhh maybe a little
02:34glennk: the values need to be scaled by expected thread occupancy
02:34karolherbst: the thing is, somehow I have to know when I can access a result of prior instructions, so I don't have waits
02:35glennk: ie these values are the hardware pipeline latencies
02:35glennk: if it can run 6 warps then for instance an add on maxwell the result is available immediately the next instruction
02:37glennk: and the max number or warps depends on several limits, #gprs being one of them, and that in turn depends a bit on the scheduler...
02:39karolherbst: somehow I want to keep it simplier for now :D
02:39karolherbst: what could SM mean?
02:42glennk: one of these https://en.wikipedia.org/wiki/Fermi_(microarchitecture)#/media/File:Fermi.svg
03:23RSpliet: karolherbst: sadomasochism
03:23RSpliet: or rather, streaming multiprocessor
03:40mupuf: karolherbst: I pushed my nouveau tree here: HEAD
03:40mupuf: with the updated commit bit :D
03:40mupuf: it would be nice to have your tested-by
03:41mupuf: and btw, read the mailing list for the recent developments on this
03:41mupuf: I took appart my maxwell to do this analysis
07:28night199uk: anyone understands the mode setting in nv50_display/corenv50?
07:44RSpliet: night199uk: well, presumably the author... you might want to ask more specific questions though
07:49pecisk: btw, is there any way to help with re-clocking efforts? Are there some data mining required? I have Kepler (GTX 760) and would like to use Nouveau for more gaming :)
07:52RSpliet: pecisk: that card presumably has GDDR5, in which case karolherbst has some patches that might help you out
07:53karolherbst: RSpliet: ohh yes, but thats not all. Allthough I usually know a way to stabilize 0f pstate on every kepler card
07:53RSpliet: I know
07:54RSpliet: pecisk: you might want to test some of his work to see if you get to high clocks with them; and provide feedback if it doesn't work ;-)
07:55pecisk: RSpliet, where I can get those patches? :)
07:56karolherbst: last one
07:56karolherbst: there are outstanding core voltage/timings bugs though
07:59pecisk: how to check my clocks when I run it? Have to run some game or just trying to switch them in background?
08:03karolherbst: pecisk: pstate sysfs file
08:04karolherbst: this scheduling thing doesn't make any fun if you don't know the hardware latancies for the instructions :(
08:04pecisk: so it is all a bit of guessing game
08:05karolherbst: pecisk: this was unrelated to the clocking stuff
08:06karolherbst: pecisk: do you kow the pstate file?
08:07pecisk: . /sys/class/drm/card0/device/pstate this one?
08:14RSpliet: karolherbst: nor the pipeline properties, dual issue constraints... :-D
08:17RSpliet: but if you get to reduce register pressure, that's a known win :-)
08:17karolherbst: already done that
08:18karolherbst: lowered reg usage by 15% without perf loose
08:18RSpliet: but no perf wins either?
08:18karolherbst: aruond 3% in heaven
08:18RSpliet: okay, that's... maybe expected...
08:18RSpliet: you could look into ways of increasing data locality? idk, that might be possible
08:18RSpliet: reduce memory bus pressure
08:19RSpliet: increase cache utilisation
08:19RSpliet: or you could think of microbenchmarks that might reveal latencies somehow :-)
08:22karolherbst: RSpliet: do you think I might be able to hand assembly write some cuda kernels?
08:23RSpliet: the DDX stack uploads hand-assembled shaders
08:23RSpliet: and I'm sure there's more convenient code-paths too!
08:23karolherbst: I meant like I just write an assambly file and let the nvidia cuda toolkit throw out some "perf" stats or something
08:26pecisk: karolherbst, it is enough to compile and install it over system's Nouveau driver? Or I need additional deps
08:27karolherbst: you can simply install it
08:27karolherbst: RSpliet: my algorithm seems to be good, only the throughput values have to be adjusted now :/
08:28karolherbst: RSpliet: gputest pixmark piano has around 3% lower frame times, too
08:29karolherbst: but hey, 3% more perf is still "good" somehow
08:31karolherbst: RSpliet: found a little compiler optimizations though: https://gist.github.com/karolherbst/4b7459cc0b6a4bc4c532
08:31RSpliet: karolherbst: potentially
08:32karolherbst: if the first two instructions are better ordered, that will save two MOVs in post RA
08:32karolherbst: and the moves are something like MOV $r0 $r1; MOV $r1 $r2
08:33RSpliet: ah, that is unclear from your paste
08:33pecisk: karolherbst, cool, will do, thanks
08:33karolherbst: ohh wait, my movs are wrong
08:33karolherbst: ohh wait
08:33karolherbst: no, its fine
08:34karolherbst: will show you post RA
08:34karolherbst: RSpliet: https://gist.github.com/karolherbst/4b7459cc0b6a4bc4c532
08:34RSpliet: but that could indicate that RA can be improved instead to eliminate useless movs :-)
08:34karolherbst: RA inserts those
08:35RSpliet: sounds more robust than scheduling for RA silliness
08:35karolherbst: but then RA has to switch instructions
08:36karolherbst: look at the post ra instructions
08:36RSpliet: oh right yes
08:36RSpliet: no, RA shouldn't switch instructions
08:37karolherbst: I found that inside glxspheres, because the compiler isn't deterministic
08:37RSpliet: there's no random input right?
08:37karolherbst: hashing by pointer not value
08:38karolherbst: and pointers are somehow random with limits
08:38RSpliet: ah ok yes
08:39karolherbst: though unordered_set does hasing by value only, but if you put pointers in it ...
08:39RSpliet: well, nondeterminism doesn't have to be a problem
08:40RSpliet: the redundant movs are ;-)
08:43karolherbst: mhh but the optimization might be somehow easy: if instruction uses range of reg, write regs starting from last
08:43karolherbst: or something like that
08:43karolherbst: mhh my paste has the wrong order...
08:45karolherbst: RSpliet: this is the tgsi https://gist.github.com/karolherbst/4b7459cc0b6a4bc4c532#file-glxspheres-shader
08:46karolherbst: yeah, stock mesa is stupid
08:47karolherbst: RSpliet: by the way, do you know why the blob sets regs to 0x0 sometimes without ever using them again?
08:51RSpliet: karolherbst: oh right, so TEX expands to those three instructions... well, I guess that the TGSI->nv50 IR step should become smarter about this
08:51karolherbst: there is more though
08:52karolherbst: this entire shader looks strange
08:56linfan: can nouveau handle 4 k monitors over displayport?
08:57imirkin: linfan: in theory, an SST 4k monitor should work. in practice, i don't know if it's been tried.
08:57linfan: so i need the nvidia blob then ok
08:59imirkin: karolherbst: ideally RA would be a little smarter about it all. the alignment stuff can be tricky though
08:59imirkin: karolherbst: however in some very controlled situations, a post-RA fixup would work
09:00karolherbst: I see
09:00karolherbst: but the two MOVs won't change much, do they?
09:01imirkin: well, the mov's are inserted so that RA has freedom to resolve conflicts
09:01imirkin: e.g. imagine you had a tex with coords foo, bar and bar, foo
09:01RSpliet: linfan: it might not work on account of some hard coded pixel clock limits which could be too low for 4K
09:02karolherbst: imirkin: yeah I know why RA did this, I was just wondering if that would have any effect on perf
09:02RSpliet: that's a known problem with some known solutions, none of which have been implemented to date :-P
09:02karolherbst: okay, one reg is used less for "some" shaders, but otherwise?
09:02imirkin: karolherbst: minimal i would think
09:02imirkin: RSpliet: that's for using higher hdmi frequencies. he's talking about DP.
09:03linfan: it worked on 60 Hz on the windows install, but on this old amd puter over hdmi only on 30 Hz, but going to put a 670gtx with display port and try
09:04imirkin: linfan: you need HDMI 2.0 to get > 30hz over hdmi for 4k
09:04imirkin: afaik there is no released hw with HDMI 2.0... maybe the very latest from amd/nvidia, dunno
09:04linfan: but that was useless
09:05linfan: but maybe a displayport on the other gpu would give a higher Hz
09:06linfan: but as i understand not worth to try nouveau, which I really like :(
09:06linfan: have to go for proprietary nvidia blob i presume
09:07imirkin: linfan: if it's a MST panel, i.e. there are 2 panels inside, it won't work with nouveau
09:07linfan: I cannot find any information on the net so I am prepared linux cannot handle 4k
09:07imirkin: linfan: if it's an SST panel, i.e. it appears as a single monitor to DP, then it should, in theory, work with nouveau
09:07linfan: you tell me 670gtx really dunno what panels are
09:07imirkin: panel = the monitor
09:08imirkin: but if you have all the hw on-hand, easiest to just plug it in and try rather than speculate
09:09karlmag: Maybe someone here might be able to enlighten me a bit. I have a Zotec 630 <something - can't remember exact card right now>. It has three physical outputs; 2xdvi + mHDMI. I can connect monitors to all outputs and saw all outputs in that kde-screen setup thingie, but I only get actual output on two monitors.
09:09linfan: it is a samsung lu28d590 with two hdmi and one display port
09:10imirkin: linfan: feel free to do the research about your monitor, i don't have time for that.
09:10karlmag: Is it possible to get output on all three monitors at the same time? If not, is the limitation in the hardware or the Nouveau driver, both or elsewhere?
09:10linfan: lol, need to install linux first, hahaaa, i just bought a 500 gb ssd
09:10imirkin: karlmag: lspci -nn -d 10de:
09:10karlmag: imirkin: gimme a couple of minutes to set up that actual machine first.
09:11karolherbst: imirkin: I bet you also don't know a good why to get the "right" throughput values?
09:11imirkin: karlmag: if it's a fermi, the limitation is in the hardware
09:11linfan: it will suffice with just one, for the moment the 4k monitor is just collecting dust on the floor
09:11imirkin: karlmag: if it's a kepler, should work...
09:11linfan: tx, will see what happens
09:11imirkin: karolherbst: that's a safe bet :)
09:12karolherbst: heaven gets a big perf drop, when instead of chosing any instruction I schedule those with highest throughput first
09:12karolherbst: where the opposite should be the case I assumed
09:16karolherbst: imirkin: another question, do you know why the blob sets regs to 0 sometimes without using them?
09:17karlmag: 01:00.0 VGA compatible controller : NVIDIA Corporation GF108 [GeForce GT 630] [10de:0f00] (rev a1)
09:17karlmag: 01:00.1 Audio device : NVIDIA Corporation GF108 High Definition Audio Controller [10de:0bea] (rev a1)
09:18imirkin: karlmag: that's a fermi, it only has 2 CRTC's
09:18karlmag: ah, right
09:18imirkin: karlmag: in theory you might be able to run all 3 screens if 2 of them are cloned, but i don't think that's a tested configuration
09:19karlmag: I still was kind of puzzled that all outputs are shown and can be manipulated in the kde display setup
09:19imirkin: karolherbst: they probably forget to remove the instructions, dunno
09:19imirkin: karlmag: well, all the outputs are usable... just not all at once :)
09:20imirkin: there isn't a great way for the hw to communicate such restrictions to userspace unfortunately
09:20karolherbst: imirkin: setting everything to 0x0 in sched hurts perf a lot :D fun value to play with
09:20karlmag: so hardware (or firmware) limitation then
09:21imirkin: karlmag: hardware, sadly
09:21imirkin: karlmag: all kepler and newer hardware has 4 crtc's
09:21karlmag: I think that's one of the single most annoying things about hardware (at least graphics cards) I can think of; How many monitors can I actually connect and get a picture on at the same time.
09:22karlmag: Unless you're "in the know" somehow it's basically impossible to tell
09:22karlmag: and those who sell the darn things tends to not know either. And don't mention Linux to them (usual blank stare)
09:22karlmag: I could rant about that for hours, but I'll stop now :-P
09:23imirkin: karlmag: well, the reason i asked you for lspci is that a GT 630 is either a GF108, GK107 or GK208
09:23imirkin: the latter 2 do in fact support 4 crtc's
09:23karlmag: Ah, right.. F vs K
09:23imirkin: so... you have the good marketing folk to thank for that
09:23karlmag: Yeah, I'll thank them with a baseball bat.. with rusty spikes
09:24imirkin: that's the way i always do it :)
09:24karlmag: *thumbs up* :-D
09:24karlmag: Well, I couldn't really beat the price for that card, that is I have two, actually
09:24karlmag: got them for free, so..
09:24imirkin: beggars can't be choosers?
09:25night199uk: RSpliet: heh, I wish i could
09:26karlmag: But - if I understand correctly - the inability to auto-config multiple graphics cards (nouveau) is a driver limitation?
09:26night199uk: RSplit: i guess I’m wondering if someone that really knows about mode setting can look after some MMIO traces (not from nouveau)
09:27night199uk: really need some help from an expert, spent a bunch of days comparing mmio traces from nouveau and i don’t see a problem
09:27night199uk: let me see if i can figure the the author of those bits of code though, was it ben skeggs?
09:28night199uk: seems a lot of the mode setting code is very similar to some very old code from xf86 as well
09:28imirkin: karlmag: not sure what you mean by 'autoconfig'
09:29imirkin: karlmag: the multi-gpu situation on linux in general is not well supported
09:29imirkin: karlmag: however it does work for a lot of people.
09:29imirkin: as for whether your fancy gui tools work it all out... who knows. i never use those ;)
09:30night199uk: heh, i have that exact same card in my test box karlmag
09:31karolherbst: imirkin: maybe I rather try to figure out "better" values for the scheduling data, I somehow thing and get the feeling, they have a bigger impact on perf then instruction ordering
09:31imirkin: karolherbst: could be. but apparently calim spent *quite* a while on that stuff, and got it "right"
09:31karlmag: imirkin: well, not having to fiddle with xorg.conf got get it to work, basically
09:32imirkin: karlmag: actually fiddling with xorg.conf precludes it from working, sadly
09:32karlmag: imirkin: didn't work when I didn't have an xorg.conf
09:32imirkin: karlmag: http://nouveau.freedesktop.org/wiki/Optimus/
09:32karlmag: and I remember reading something about most developers doesn't have a multi card setup
09:33imirkin: you're looking for the "Using outputs on discrete GPU" section
09:33imirkin: it talks about intel/nvidia but it applies just the same to nvidia/nvidia
09:33imirkin: except instead of 'nouveau intel' just do '1 0'
09:34imirkin: some DE's set this up automatically, some don't
09:34imirkin: either way it's not a "nouveau" thing
09:36imirkin: librin: btw, can you confirm that mesa git fixes the WoW issue for you? it made the trace you provided replay fine for me.
09:36karlmag: imirkin: did make it work with 4 monitors tweaking the config mentioned here; http://nouveau.freedesktop.org/wiki/MultiMonitorDesktop/
09:36karlmag: 4monitors, 2 cards
09:36imirkin: karlmag: yeah that's the other approach
09:36imirkin: karlmag: which loses you acceleration, dynamic reconfig, etc
09:36imirkin: but... it works :)
09:36karlmag: but the flexibilty was a bit left to be desired
09:37imirkin: reverse prime is what the cool kids are using... but it kinda sucks too
09:37karlmag: I guess "choose your level/way of suckage here"? :-P
09:38imirkin: well... for some people reverse prime works fine
09:38imirkin: for some it doesn't
09:38imirkin: i haven't figured out what the correlation is
09:38imirkin: i suspect *some* measure of PEBKAC but i haven't identified what
09:39karlmag: heh.. not always so easy to pinpoint I know :-P
09:39karlmag: Hmm.. but if some developers are lacking cards (especially a bit older ones) I might be of help (at least over time).
09:40karlmag: I've had piles of graphics cards laying about, though the oldest ones are mostly tossed now I guess.
09:40imirkin: karlmag: well, i know RSpliet has been looking for G80-G98 gpu's... he's in the UK right now
09:40karlmag: Still I do get my hands on a bit older cards from time to time.
09:41karlmag: like this one; 01:00.0 VGA compatible controller: NVIDIA Corporation G84GL [Quadro FX 570] (rev a1)
09:42night199uk: heh, i found a second hand graphics card mall here in hong kong
09:42imirkin: karlmag: yes, a lot like that one.
09:42night199uk: they have some REALLY old stuff
09:42night199uk: isa s3 cards
09:42night199uk: for $5 or something stupid
09:42karlmag: night199uk: hehe.. I can imagine
09:42librin: imirkin, I'm running the blob right now, but as soon as I am done with some stuff, I'll boot to nouveau and test it. Along with Bug 90513, to see if it changes anything there.
09:42night199uk: that place is awesome
09:42night199uk: vesa gfx cards
09:42karlmag: I think I've tossed out most isa and a bit part of the agp cards now
09:43imirkin: and i thought i was a pack rat...
09:44imirkin: librin: no rush, whenever you get to it
09:44karlmag: imirkin: I guess I should try to find more of the cards I have all over the place and figure out what I can give to "the cause" :)
09:44karlmag: imirkin: pack rat? depends..
09:45karlmag: You want my collection of old Unix hardware?
09:45karlmag: Sun, HP, Sgi, etc..
09:45karlmag: You have to pick it all up though.
09:45imirkin: karlmag: you should meet mattst88 :)
09:45karlmag: Does he want to pick up the stuff?
09:46imirkin: i'll let him answer that one... i do know he has a bit of a collection
09:50karlmag: Maybe I should hope he lives close enough then ;-)
09:50imirkin: he's in the US northwest i think
09:50karlmag: darn.. wrong continent
09:50imirkin: might be a bit of a drive then
09:51karlmag: Not really sure you can.
09:52karlmag: Ferries between N.America to Europe?
09:53imirkin: all depends on what kind of car you drive... maybe in one of those? http://www.bostonducktours.com/
09:53Karlton: drive through the North Pole
09:54karlmag: You certainly don't want anything looking like bad weather though
09:54imirkin: it's the pacific ocean... named for its calmness
09:55night199uk: is there any way to diagnose what’s actually coming out of a dvi port?
09:55night199uk: i’m guessing some kind of dvi breakout and a scope is the only way, right?
10:01thopiekar: Hi, I had problems in the past with your driver. The messages I get on dmesg were http://pastebin.com/CVj7m7cD .. I don't know whether there is a tool to check how much GDDR is used, but after checking it with nvidia drivers I found out that the GDDR was almost full. So increased the GDDR in BIOS from my DDR and nouveau and nvidia drivers were acting good after that.
10:02thopiekar: So, could you please check for this in the past`? Maybe adding a warning about a almost full GDDR? This would make finding the problem easier than guessing whats wrong :)
10:13mlankhorst: your vram should be used to the max..
10:23imirkin: thopiekar: not sure what you mean... you mean increasing the amount of stolen memory?
10:25thopiekar: imirkin: Increasing the amount of stolen memory isn't possible, isn't it? I thinking about a warning that the GDDR is simply full or going to be full. In that case the user can check the BIOS or whatever the system looks like and upgrade - in my case it was increasing the amount of GDDR in BIOS...
10:25specing: < karlmag> Ferries between N.America to Europe? <-- same boats involved in D-Day landings ;)
10:26imirkin: thopiekar: what do you mean by 'the GDDR'?
10:26karlmag: specing: heh... the landing crafts? don't think they're designed for that type of journey either :-P
10:26thopiekar: imirkin: I wasn't even think about the possibility that this could be a reason for those problems.
10:26thopiekar: GDDR=Memory, which is used by the GPU
10:26imirkin: thopiekar: NVAA/NVAC has no actual VRAM... it just uses system memory
10:27imirkin: aka "stolen memory"
10:27imirkin: the amount is configurable in the bios
10:28thopiekar: imirkin: I know. Sorry for mixing that up.. Won't those problem appear when having a real RAM?
10:29imirkin: thopiekar: probably, yeah. stolen system memory doesn't act any differently from physical VRAM attached to the GPU]
10:29imirkin: however "those problems" aren't due to a lack of VRAM, although increasing the quantity of VRAM may work around them
10:31thopiekar: Hmm, ok. I thought it is some kind of out-of-memory problem or content is overwritten, so the GPU is acting wrong.
10:34Sokel: Reading through some of the various nouveau documents, it's not clear why I'm having a particular issue with my optimus laptop. I use my laptop screen and two monitiors attached to a doc, which ends up using the nvidia GPU. This is fine with the proprietary driver (with some manual X and xrandr stuff). I wish not to use that. ...
10:35imirkin: Sokel: are you going to keep us guessing as to what your issue is?
10:35Sokel: When I use nouveau, everything "works", except random artifacts appear, images of the mouse cursour and other windows just stick around for whatever reason and they don't go away unless another window or something slides across the artificats.
10:35imirkin: ah, that's a known issue
10:36imirkin: we're not entirely sure *why* it happens
10:36imirkin: however nouveau is just drawing whatever the intel gpu generates
10:36imirkin: so the issue is most likely somewhere in the intel ddx
10:36imirkin: are you using SNA?
10:43Sokel: I'm actually unsure if I am. Any way I could check?
10:43imirkin: Sokel: pastebin xorg log
10:43Sokel: One second good sir.
10:51thopiekar: imirkin: Maybe you know... Has the xorg-edgers // xorg-swat (Ubuntu) an IRC channel?
10:52imirkin: thopiekar: sorry no clue
10:52imirkin: thopiekar: i'd start in #ubuntu (i bet that's a channel)
10:54thopiekar: imirkin: Thanks :)
11:30Sokel: imirkin: http://www.fpaste.org/265778/14419098/ -- Sorry about the wait. Got pulled away for something else.
11:32imirkin: [ 41.081] (II) intel(0): SNA initialized with Ivybridge (gen7, gt2) backend
11:32imirkin: sorry, dunno.
11:32imirkin: i'd ask the intel folk
11:36Sokel: Thank you. I'll poke around.
11:46Sokel: imirkin: I made a config to make it so Intel gets its accelmode set to uxa (though I do know sna is probably better). The artifacts went away. Not sure for how long though.
11:47imirkin: perhaps you can convince ickle to debug it
12:19pmoreau: l1k: "BUG" messages are gone with the new version. :)
12:20pmoreau: l1k: I'll try what you suggested to see if it gets rid of "Read DPCD directly" being run every ~30ms.
12:22pmoreau: l1k: And btw, after switching a few times, Nouveau will crash while starting X due to an "Unable to handle kernel page request" in evo_wait. Which does not happen without your patches iirc. I'll debug try to debug this this week-end.
12:39pmoreau: l1k: Ok, just checked out deceb98, and both problems solved: continuous probing and the crash.
13:28l1k: pmoreau: thanks so much for the feedback! (and sorry for the delayed response, I was briefly AFK.)
13:29pmoreau: l1k: No problem! You can't always be behind your computer. ;)
13:30l1k: pmoreau: so weird that these problems disappeared. maybe it's not the reprobing but rather the proxying stuff? anyway sounds like the first few commits are safe, good to know.
13:30l1k: pmoreau: I'll try to reproduce it.
13:31pmoreau: I'll try to figure out the culprit(s) this week-end.
13:40linfan: so now i have my install ready in nouveau with the 4 k screen and it seems to work well
13:41linfan: 27" 16:9 3840x2160
15:45RSpliet: imirkin_ Re "nv50/ir: don't fold immediate into mad if registers are too high": damn, it's unbelievable how many things I overlooked with one seemingly harmless change
15:45imirkin_: hopefully that's the last of 'em
15:46imirkin_: the bigger issue wasn't your fault though
15:46RSpliet: well, no sure, I'm looking at your patch thinking "was it even possible to have more than 32 regs"?
15:46imirkin_: (interp instructions getting messed up when the dest was > r64)
15:47RSpliet: wise lesson learned today: make sure you know the *entire* ISA before you propose changes :-P
15:48imirkin_: that just leads to never making any changes
15:48RSpliet: yes true
15:48imirkin_: i prefer to just do stuff and hope for the best
15:49imirkin_: i reviewed + applied your changes, so it's not like there were obvious things
15:49imirkin_: but yeah, it does seem like they gave rise to a surprising quantity of fail
15:50RSpliet: oh I'm not worried about blame, but more like "wow, ok, next time be a lot more thorough in verification. Let's try harder in breaking things" :-D
15:55imirkin_: yeahhh... that's a nice thought
15:56imirkin_: i've kinda given up on trying to QA that sort of thing THAT carefully
15:57RSpliet: another argument to dedicate a K1 for regression testing :-P
15:58imirkin_: well it's not like these issues came up in piglit
15:58imirkin_: i guess ideally we'd just have a bunch of traces and compare 'dump-images' output
16:02RSpliet: sounds like not a bad plan; does it exist already?
16:02imirkin_: i keep a bunch of traces locally
16:02imirkin_: and i have a K1 that i've gotten to boot linux all of once
16:03RSpliet: it'd still only capture Kepler regressions though :-(
16:03RSpliet: getting SoCs to boot is an interesting process
16:06RSpliet: not sure what surprises Tegra has, I only set up boot envs for Versatile Express, STE U8500, Freescale IMX6Q, Altera SoCFPGA, Allwinner A10/A20 and Qualcomm APQ8064 :-P
16:09imirkin_: well, gnurou provided me with a script that worked. it would have taken the remainder of my natural life to come up with that script myself though.
16:10imirkin_: 8064 is pretty easy, it uses fastboot
16:10imirkin_: tegra (at least the TK1) works as a real computer though
16:10imirkin_: which makes it harder to use the way i wanted to
16:10RSpliet: mmm, I found U-boot easier to debug tbh
16:10imirkin_: i find it's easier not to have to debug bootloaders
16:11RSpliet: hmm, yes
16:11RSpliet: unfortunately, fastboot didn't do what I wanted it to do straight away
16:13imirkin_: well, i was trying to use the jetson tk1 like i used fastboot -- feed it a kernel and watch it boot
16:13imirkin_: it's very clearly not designed for that though
16:13RSpliet: mmm, uboot *can* do tftp boot
16:13RSpliet: that said
16:13imirkin_: that's not what i was lookign for
16:14imirkin_: that's it pulling a kernel
16:14imirkin_: i want to feed it a kernel and watch it boot ;)
16:14imirkin_: anyways, it's all achievable -- gnurou's script is somewhere on my box, hopefully i can find it again
16:14imirkin_: the second you start messing with tftp, everything can go wrong
16:15imirkin_: e.g. some idiot forgot to turn off the *other* dhcp server on your local network and they're fighting
16:15RSpliet: well, I fed my router (MIPS something something) a new image through TFTP
16:15RSpliet: that was successful :-P
16:15imirkin_: (took me a while to figure out why the mac g5 i got was getting a very wrong ip address)
16:16RSpliet: can you not get U-Boot to be a TFTP *server*? eg: wait for someone to upload *stuff*
16:17RSpliet: idk, it sounds kind of silly if I think of it, but there must be a use-case for it
16:22imirkin_: RSpliet: well, gnurou provided me with a way to ship the kernel along with u-boot
16:24RSpliet: if it works :-)
16:24imirkin_: it worked once
16:24imirkin_: [not like i tried and failed a second time... just... haven't had a lot of time to try]
16:24imirkin_: i think the move is to use gbm on there. i wonder if glretrace can work with gbm...
16:25RSpliet: TFTP server doesn't seem to be too tidy. You can stick it in an endless loop in your env to avoid timeouts that's not the problem, but having a kernel and a DTB as nice separate files with a little script to glue stuff together is clearly not the use case
16:27RSpliet: imirkin_: congratulations bt
16:27RSpliet: you are now a "Prolific Open-Source Contributor"
16:27RSpliet: and "independent"
16:27RSpliet: whatever that may mean
16:27imirkin_: RSpliet: do i get a certificate?
16:27imirkin_: well, 'dependent' is a tax status in the US... i've been filing independently for quite a while now though
16:28RSpliet: no, but you do get a nice webpage on phoronix.com that you can claim to be a certificate if you want
18:28marcosps: imirkin: http://pastebin.com/jte7mGD7
18:28marcosps: I'm just curious about the insnCanLoad.
18:28imirkin_: + Storage ®_i = i->getSrc(1)->asImm()->reg;
18:28imirkin_: that's wrong
18:29imirkin_: you're told which source of i is being considered
18:29marcosps: imirkin: but, the code doens't reaches that part at all...
18:29imirkin_: because sf == FILE_IMMEDIATE is false
18:29imirkin_: because it's a merge
18:30marcosps: imirkin: You're right. I forget about it.
18:32marcosps: imirkin_: So... about the insnCanLoad, how can I make the code enters there? Should I change the IMM declarations or something?
18:33marcosps: imirkin_: for now, I'm just trying to print the values of i and ld here to verify whats the right value here..
18:33imirkin_: dunno... the simple thing would be to make it detect the condition
18:33imirkin_: i.e. if "ld" is actually a merge, check what it's merging
18:33imirkin_: and if it's merging immediates AND the instruction has a 64-bit sType, then treat it as an immediate
18:38marcosps: imirkin_: Hum... I'll create a new test to verify if the sf is a MERGE, and thus check the ld here.
18:53marcosps: imirkin_: in peephole.cpp if the op is not LOAD or MOV, it doesn't call insnCanLoad. Should I change it to accept OP_MERGE too?
18:54imirkin_: you mean in LoadPropagation?
18:54imirkin_: it skips over the whole thing if it doesn't think it's a load
18:55imirkin_: you need to convince it otherwise
18:55marcosps: imirkin_: yes, ok, I got it!
20:24marcosps: imirkin_: http://pastebin.com/sunCT3QZ
20:24marcosps: it seems I'm doing something wrong here, because it's not printing the OK message there.
20:24imirkin_: ld->getSrc(0)->getInsn()->getSrc(0)->reg.type == TYPE_F64
20:24imirkin_: probably not what you want
20:25marcosps: I need to check for both cases to be 32b, right?
20:25imirkin_: you probably want something a bit cleverer like
20:25marcosps: imirkin_: Or, in this case, I need to test against the double set before insnCanLoad?
20:26imirkin_: take a look at ValueRef::getImmediate()
20:26imirkin_: among other things, you want to use that
20:26imirkin_: but perhaps it'd be wise to even just extend it
20:26imirkin_: so that it accounts for the merge use-case
20:27marcosps: imirkin_: hum...
20:27imirkin_: although... unclear if you want to be playing the modifier game
20:28imirkin_: although, shouldn't hurt. either abs or neg only modify the high bits
20:29marcosps: imirkin: so, I need to verify what's this Modifier thing :)
20:30imirkin_: ehhhh... don't really worry about those for now
20:30imirkin_: but basically an instructino source (i.e. valueref) may have modifiers
20:30imirkin_: like abs or neg
20:32marcosps: imirkin_: Ok... I'll change the getImmediate to verify OP_MERGE and test it in insnCanLoad.
20:32marcosps: imirkin_: Do you think this is the right/best solution?
20:32imirkin_: not sure
20:33imirkin_: i haven't really thought too much abou tit
20:54marcosps: imirkin: http://pastebin.com/zKqnjeii
20:55marcosps: damn, why this is crashing for me? I tried to call "src->insn->op == OP_MERGE"
20:58marcosps: imirkin: but, at least in target_nvc0, I think I implemented as you suggested: http://pastebin.com/dxBXQrmk
21:01imirkin_: if (typeSizeof(ld->dType) != 8) -- should also check i->sType
21:01imirkin_: a.reg.type != TYPE_F64)
21:01imirkin_: that will never be true
21:02imirkin_: you need to check that only the high 20 bits are set
21:02imirkin_: i.e. that a == 0 && b & 0xfff == 0
21:02imirkin_: and that s == 1
21:02imirkin_: otherwise the immediate can't be loaded
21:09marcosps: imirkin: thanks!
21:10marcosps: But, about the getImmediate, do you have any tip about looking the op == OP_MERGE there?
21:13marcosps: it's strange because src is tested before my addition...