04:18karolherbst: mupuf_: sometimes at read out my core counter (only set to 0x1) is sometimes higher then the ticks counter :/
04:18karolherbst: allthough I read the ticks counter out last
04:18karolherbst: and reset it first
04:19karolherbst: any idea?
04:20karolherbst: it seems I have to workaround this a bit, but I would rather not :D
04:35karolherbst: best way to do reg = max(reg, 0xff) on the falcon?
04:36karolherbst: cmp, bra, set? or is there something better
04:37karolherbst: and with set I mean imm32
04:53mupuf_: yes, I would suggest you just clamp the value
05:39karolherbst: mupuf_: yeah, but this is kind of ugly in the code, because I have to do that for 4 counters now :/
05:40karolherbst: ohh wait
05:40karolherbst: I could skip the div then
05:40karolherbst: still 4 cmps, meh
05:54mupuf_: make a function?
05:55karolherbst: add a min function to arith.fuc?
06:05RSpliet: does it not already exist?
06:09karolherbst: didn't find it
06:10RSpliet: to my surprise, it doesn't seem to no :-P
06:10karolherbst: min is really ugly to write in assembly code :/
06:12karolherbst: isn't there an instruction to just conditionally skip the next one
06:12RSpliet: I take it falcon is fairly risc
06:14karolherbst: RSpliet: it seems there isn't even a min/max instruction on x86
06:14karolherbst: there is some SSE4 stuff though :D
06:17RSpliet: ARM I think was able to do conditional execution of instructions
06:18karolherbst: mhh
06:18karolherbst: it just bothers me, that I need to add an extra label just for that :/
06:33karolherbst: ohhh wait
06:33karolherbst: mupuf_: I think the issue is not the counter being greater than the ticks, but nearly equal and I get 0x100 as the result :/
06:34mupuf_: hehe, clamp it
06:34mupuf_: make a function for it
06:35karolherbst: yeah, min is also used for this, so I can use it there
06:39karolherbst: yay, now it works
06:40karolherbst: mhhh
06:40karolherbst: I could also divide the ticks by 0xff instead of doing a shr 8
06:40karolherbst: :/
06:42karolherbst: yeah, this may make more sense in the end
06:44karolherbst: anyway I cleaned up the implementation and use 3 registers less, so this is also nice :D
06:57karolherbst: here is my fuc code: https://github.com/karolherbst/nouveau/blob/3b0404ea0f647b84ca3ae0306a05fe4ee266265d/drm/nouveau/nvkm/subdev/pmu/fuc/perf.fuc
06:57karolherbst: do you want to check if I did something terribly wrong regarding how fuc works?
06:58karolherbst: the ld(b8, $r11, #perf_eng_pcie) shl b32 $r11 8 looks dodgy to me, but it works
06:58karolherbst: so I don't know
07:04mupuf_: karolherbst: was $r11 cleared before?
07:04mupuf_: otherwise, I do not see what is dodgy here
07:05karolherbst: does it matter if it's cleared? ohhh it does on tesla :O
07:05mupuf_: as long as you never read more than 16 bits of r11, I do not see the problem here
07:05karolherbst: I write all bits of it on fermi+ that is
07:29mupuf_: why would tesla be different?
07:30mupuf_: I would say clear a reg before using it if you do not want to remember which parts of the reg got init where
07:30mupuf_: and, always clear a reg you are going to use at the begining of the function
07:55karolherbst: mupuf_: on tesla I only shift 3 8 bit values into $r11, because it isn't possible to read the pcie load
07:55karolherbst: so only core, memory, video
07:57karolherbst: mupuf_: also I am not quite sure when to use these pushed regs and when these "high" ones. I somehow got that the lower ones are pushed (cleared?) before and then reset from stack at the end. The high ones are usually used for functions parameters
07:57karolherbst: but do I have to reset those high regs to their old values at the end of the function, too?
07:58mupuf_: ok, so this is all down to conventions
07:58karolherbst: like if I am using $r13 somewhere, does the calee expect $r13 to be it's old value after call, when I do not document it?
07:58karolherbst: yeah I figured
07:58mupuf_: r0 shoud always be 0
07:58karolherbst: in perf_counter_readout I use $r14
07:58mupuf_: r1-r9 are callee-saved, which means that if a function replaces them, it should restore them afterwards
07:58mupuf_: r10-r15 are caller-savec
07:59mupuf_: which means that if you want to keep useful data in it, you need to save them before calling a function
07:59karolherbst: okay
07:59karolherbst: so after a function call r10-r15 can be whatever
07:59mupuf_: yes
07:59karolherbst: okay
07:59mupuf_: that's how I remember it
07:59karolherbst: yeah, it figures a bit that way
07:59mupuf_: I wrote a commit to really enforce this
08:00karolherbst: so instead of doing push $r1...
08:00karolherbst: I could also use $r12 for example
08:00karolherbst: as long as I don't call functions
08:00mupuf_: http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=e38efaee0b89c50979c52b52dbe7df5dc943bf1a
08:00mupuf_: yes
08:00karolherbst: okay
08:01karolherbst: which I guess is preferred or doesn't it matter
08:01mupuf_: well, less instructions == good
08:01karolherbst: yeah I figured
08:02karolherbst: okay, then I can replace $r1 and $r2 in perf_counter_readout with $r13 and $r12
08:02karolherbst: because I don't call anything
08:02karolherbst: or is nv_iord considered a call?
08:02karolherbst: or other macros
08:02mupuf_: macros are not calls
08:03mupuf_: and they usually use r0 to do everything
08:03karolherbst: okay
08:03mupuf_: and then clear r0 again
08:03mupuf_: but check the code
08:04mupuf_: example: http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=eabe8bca383c91286db70e5aad2d33ac44a9595a
08:04karolherbst: what is this nv_iord thingy all about?
08:05karolherbst: ohh
08:05karolherbst: I get it
08:05karolherbst: :D
08:06karolherbst: ohh the falcons prior to gk208 can only mov 16bit?
08:06karolherbst: good to know
08:11mupuf_: hence the function :p
08:12mupuf_: I really like the fuc ISA
08:12mupuf_: it is pretty simple and useful
08:12mupuf_: oh, and in your code, you could have used the instructions to add the value straight at the right place in the register
08:12mupuf_: without needing to shift
08:12mupuf_: that would save one instruction
08:12mupuf_: but meh
08:27mwk: heh
08:27mwk: I have no idea why they went with variable-length opcodes
08:27mwk: otherwise, it's quite nice
08:28mwk: maybe xtensa overdose
08:34mupuf_: :p
08:34mupuf_: mwk: I am fixing the platform code before pushing it
08:34mupuf_: not that I had to do much
08:34mupuf_: I added a comment to tell that there is no risk of exposing gpus older than the tk1
08:35mupuf_: I checked the device trees to be sure
08:35mupuf_: I still need to figure out how to map the damn bar1
09:10mupuf_: mwk: hmm, I really don't know how to get the address for bar1...
09:10mupuf_: I found the info in the device tree, but the exposed version in sysfs does not seem to contain it
09:10mupuf_: that can't be right
09:26karolherbst: mupuf_: actually that would save like 3 shifts
09:26mupuf_: yes, I meant one instruction per value
09:26karolherbst: okay
09:26karolherbst: it makes sense nethertheless
09:26karolherbst: which instruction is it?
09:26mupuf_: but you are right, thefirst one does not need the shift
09:26mupuf_: look for extr
09:26mupuf_: you'll find the opposite one :D
09:27karolherbst: ins?
09:35mupuf_: sounds good
09:35karolherbst: mhh
10:07m3n3chm0:nasZ
10:10karlmag: oh.. tegra
10:12karlmag: I did see some patches for GM206 too recently, didn't I?
10:23karolherbst: mupuf_: can I access stuff from the data segment directly with instructions?
10:23karolherbst: I thought I always have to read them into registers first
10:24mupuf_: nope, you will need to load it first
10:24karolherbst: mhh
10:24karolherbst: then I would need one instruction more using ins
10:24karolherbst: because I already load into the return register directly
10:30karolherbst: mupuf_: want to look over the patches before I send them to the ML=
10:40mupuf_: oh right!
10:40mupuf_: and yes, I guess I can have a look
10:49karolherbst: mupuf_: thanks https://github.com/karolherbst/nouveau/commits/pdaemon_counters
10:49karolherbst: just the four newest one
11:33imirkin: who's wwa? that username sounds really familiar...
11:38mwk: imirkin: he work[ed?] for pathscale
11:40imirkin: ahh ok
12:02mupuf_: imirkin: I gave him the push rights when moving envytools out of pathscale's repo
12:02mupuf_: of course, the CTO was pissed that he lost the exclusivity in choosing who gets in or not, but hey, it is open source stuff
12:03mupuf_: and we were no ok with the model
12:07karolherbst: sounds like fun
12:08mupuf_: karolherbst: yeah, I guess...
12:09karolherbst: though I am also totally unaware of pathscale, I just noticed they have like modified version of nouveau of something, but I don't really know what they are up to or anything
12:16mupuf_: well
12:17mupuf_: the idea is that they wanted to propose a new GPGPU approach
12:17mupuf_: a new programming model
12:17mupuf_: but at first, it required a modified version of nouveau
12:17mupuf_: then, nouveau could be used
12:18mupuf_: and then mwk reverse engineered the IOCTLs of the blob and that allowed them to do all they wanted
12:18mupuf_: and we do not hear too much from them anymore
12:18mupuf_: as in, 0 communication actually
12:18mupuf_: but they employed calim and mwk for some time, and gave them hw
12:18mupuf_: they also got me started
12:19mupuf_: but they were promising stuff they could achieve though
12:19karolherbst: mhh
12:20karolherbst: these envytools commit kind of sounded like they actually know a thing about how to schedule those instructions
12:24tacchinotacchi: meawhile
12:24tacchinotacchi: anybody's working on fermi's reclocking?
12:24tacchinotacchi: it just seems to have been forgotten by everyone
12:25karolherbst: I already added it on my todo list, but there is stuff more important for me I want to do first
12:26karolherbst: tacchinotacchi: you could help though if you would like :)
12:27tacchinotacchi: i would really,really like to help
12:27tacchinotacchi: but i'm too noob for reclocking
12:27tacchinotacchi: actually i have trouble understanding any of the code of nouveau
12:27karolherbst: mhh :/
12:28tacchinotacchi: i have some experience reading and playing around with other people's projects, but nouveau among all is the one i understand less
12:28karolherbst: I always thought it is one of the more clean projects overall :D
12:28tacchinotacchi: well, among the ones i tried
12:29tacchinotacchi: for example i tried to push a new feature to obs-studio
12:29tacchinotacchi: which of course is a lot easier than nouveau, just for the fact that obs studio is a simple tool and nouveau a reverse engineered kernel module
12:29karolherbst: well, I would say the kernel module is simplier in design
12:30karolherbst: nobody cares about bad software quality in userspace, kernelspace on the other hand ....
12:30karolherbst: you don't want to include bad quality there :p
12:30mwk: wish that was true...
12:31karolherbst: well, I saw so much crappy software
12:31karolherbst: :D
12:31mwk: heh
12:31mwk: I suppose there are lot of projects that fare worse than the kernel
12:31tacchinotacchi: the last time i tried to read nouveau, i took 8 hours to find one of the functions that was doing the actual "sending data to registers"
12:32karolherbst: mwk: know the omg/sec metric? :D
12:32tacchinotacchi: to find out i had no idea when that function got called and what it was for
12:32imirkin: tacchinotacchi: roy has a couple patches... nothing close to working though: https://github.com/RSpliet/kernel-nouveau-nv50-pm/
12:32tacchinotacchi: i already talked with rspliet more than once
12:32imirkin: tacchinotacchi: nouveau has recently been rewritten... there's a bit less indirection now and a *lot* more types
12:33tacchinotacchi: a lot more types? weren't they enough? :D
12:33karolherbst: there are _never_ enough types :D
12:33tacchinotacchi: i'll give it a shot but i'll never do anything useful
12:33tacchinotacchi: francly the little knowledge i have is more c++ stylish
12:33karolherbst: memory reclocking code is though though
12:34karolherbst: *tough
12:34tacchinotacchi: i don't even know what does it mean when a function starts with __
12:34karolherbst: internal use
12:34karolherbst: do _not_ use
12:34mwk: I've had the pleasure of working on gdb recently, and I'm currently at the 3rd recursion level of debugging the debugger after hitting a showstopper
12:34karolherbst: :D
12:34karolherbst: once I was thinking about debugging gdb
12:34mwk: if you ever want to try the reverse debugging feature... think of the good old printf as debugging tool
12:34karolherbst: but then my mind stoped me
12:35karolherbst: yeah
12:35karolherbst: printf helps fixing more bugs than gdb
12:35karolherbst: :)
12:35tacchinotacchi: i agree
12:35imirkin: there's a time and place for each
12:35karolherbst: I heard about a kernel debugger somewhere :D
12:35imirkin: i hate it when people refuse to do one or the other
12:35karolherbst: never used it though
12:36karolherbst: mh I really rarly use gdb directly, I like some graphical debugger though in C++ projects
12:36tacchinotacchi: i feel like even before studying the code i should study the structure of the gpus
12:36karolherbst: mhhh
12:36tacchinotacchi: i mean the nvidia ones specificallyt
12:36karolherbst: I don't think this is really neccesary
12:37tacchinotacchi: like rspliet's got a commit on his git
12:37tacchinotacchi: Enable reclocking for GDDR3 G94-G200
12:37tacchinotacchi: what the sht does it mean
12:37imirkin: ?
12:37karolherbst: tesla
12:37imirkin: seems pretty obvious...
12:37imirkin: what part of that commit message do you not understand?
12:37tacchinotacchi: how does he distinguish individual gddr3 models
12:37karolherbst: tacchinotacchi: http://nouveau.freedesktop.org/wiki/CodeNames/
12:37tacchinotacchi: there it is karolherbst
12:38karolherbst: well, the vbios tells you which memory type the card has, maybe there are other ways as well, don't know
12:39karolherbst: the commit message is a bit wrong by the way
12:39karolherbst: should be G94-GT200
12:39karolherbst: :D
12:39tacchinotacchi: oh that's why i didn't find g200
12:39tacchinotacchi: xD
12:39tacchinotacchi: i understand now
12:40imirkin: it's unclear whether it's G200 or GT200
12:40imirkin: the chips are marked G200
12:40tacchinotacchi: i'd think that's stuff that usually happens in companies
12:40tacchinotacchi: they now it's the same, nobody cares and keep it like that
12:41karolherbst: or nobody has time to actually fix it :D
12:41tacchinotacchi: anyway, i'd have no idea how to check in the vbios
12:41karolherbst: tacchinotacchi: install envytools
12:41karolherbst: there is a tool called nvbios
12:41karlmag: maybe it should be called NVA0(G200/GT200) on the page then?
12:41tacchinotacchi: so that's why i say i should study the (little) documentations before the code
12:41karolherbst: tacchinotacchi: but dmesg should tell you which memory type you have
12:42tacchinotacchi: i can read most C code, but it's useless if i don't understand what it's for
12:42tacchinotacchi: karolherbst: i have systemd, should i be able to find that in journald?
12:42karolherbst: don't know, I have systemd and dmesg works
12:42tacchinotacchi: because i think i have no messages from nouveau
12:42tacchinotacchi: i have some from i915
12:42tacchinotacchi: no nouveau
12:42karolherbst: ohhh, laptop fun
12:42karolherbst: sounds like, fun
12:43tacchinotacchi: yea you see
12:43karolherbst: is nouveau loaded?
12:43tacchinotacchi: haha
12:43tacchinotacchi: not now, i have bumblebee now
12:43tacchinotacchi: going to eat, hit you when i come back
13:02karolherbst: mupuf_: the 0x100 bit is really strange
13:03karolherbst: but also helpfull on kepler cards with memory load
13:03karolherbst: or core waiting on memory stuff
13:03karolherbst: the thing is, on some cards it produces some high values at idle
13:03karolherbst: around 65/255 of the ticks
13:03karolherbst: but on load it goes higher (depending on what exactly you do)
13:04karolherbst: and when you upclock memory (leaving core as it is) it decreases
13:06karolherbst: same goes for the PMFB bit
13:07mupuf_: :)
13:08karolherbst: it will be fun figuring a nice bit combination out
13:08karolherbst: :D
13:08karolherbst: the thing currently in my branch is good eough for core, pcie and video
13:09karolherbst: but useless for memory
13:09karolherbst: but it at least works somehow for all cards
13:09karolherbst: thing is, I don't get anything usefull with the config the blob uses
13:10karolherbst: mupuf_: we need to create benchmarks which are targeting one engine directly, like one which does a lot of memory stuff
13:10karolherbst: while running this, the memory counter should be at full load
13:11mupuf_: not sure I get your point
13:11karolherbst: we need to verify that our counter configuration does, what we think it does
13:12karolherbst: otherwise benchmarking dyn reclocking is useless when the counters produce false reports
13:12tacchinotacchi: ok i'm here
13:13karolherbst: mupuf_: 7,8,9 and 12 are kind of usefull stuff here, but I don't know for what exactly
13:14mupuf_: oh, ok, but this is for REing
13:14mupuf_: I guess it should be trivial to write a shader that does a shitton of memory reads
13:14karolherbst: yes
13:14karolherbst: we need also one with a stalling core waiting on memory operations
13:14karolherbst: because this is more troubling
13:15karolherbst: there are sitations where the core load counter decreases after upclocking the memory only
13:16mupuf_: Yeah, more fun! :D
13:19tacchinotacchi: i have no idea what you're talking about :D i'll just let you do your work and not disturb
13:19tacchinotacchi: here i am like the silly kid who wants to play in the team thinkin he's going to help but in reality he is useless
13:21karolherbst: there are easy task everybody can do though
13:22karolherbst: mupuf_: how did you read the load in your pdaemon tool? read the regs or something else?
13:22imirkin: tacchinotacchi: best not to start with huge unsolved problems
13:28mupuf_: read the regs
13:29karolherbst: mupuf_: ever got like a high memory load? Or don't you remember
13:29karolherbst: and with high I mean above 80%
13:43tacchinotacchi: can you give me a good entry point?
13:43tacchinotacchi: after this i'll try not asking anything
13:44imirkin: tacchinotacchi: what are you interested in?
13:45imirkin: whole bunch of tasks listed at https://trello.com/b/ZudRDiTL/nouveau but they won't make much sense without further explanation
13:46mupuf_: no, I struggled to get it higher than 36% IIRC
13:47tacchinotacchi: imirkin: i would like to say i'm interested in reclocking, but i really need a general insight before. i can't start from ther
13:48imirkin: why are you interested in reclocking?
13:48tacchinotacchi: because i want to use nouveau myself
13:49imirkin: does it already work fine for you except for the clock speeds?
13:50tacchinotacchi: what other problem could i have? except for a single game not finding the opengl version, rendering works fine
13:50tacchinotacchi: i've been forced to use the proprietary all along, so even if i get an amd before finishing any code i want to help other people
13:53tacchinotacchi: https://trello.com/c/WKKbOXeI/45-auto-downclock-when-overheating this looks about right
13:54mupuf_: tacchinotacchi: sure, that should be kind of trivial to implement
13:55mupuf_: you may have to rework the code handling the temperature though
13:55mupuf_: but that really is a simple project to understand, so it is a good starting point
13:55mupuf_: ping me if you do not understand the documentation in envytools
13:55mupuf_: rnndb
13:56karolherbst: mupuf_: I already though so
13:56karolherbst: mhh, well we could actually do better than the blob here :D
13:56mupuf_: what are you talking about?
13:57karolherbst: better memory load counter
13:57mupuf_: ah
13:57mupuf_: well, before doing better, we need a way to quantify that :D
13:57karolherbst: I think I found some good configuration on my card, but it didn't work on other kepler
13:57karolherbst: yeah
13:58karolherbst: tacchinotacchi: I will probably do the downclok on overheat thingy :/
13:59karolherbst: mupuf_: that should be also done on the pmu, right?
13:59karolherbst: at least for gt215+ that is
13:59tacchinotacchi: karolherbst: i can checkout to this commit and do it myself in a worst way
13:59mupuf_: karolherbst: no you won't, there are two separate things
13:59karolherbst: imirkin: I don't think we should do that in the kernel at all
13:59mupuf_: and the one I was refering to in this trello entry is the hw-based downclocking
13:59karolherbst: mupuf_: ohh right, there are alarm events
13:59mupuf_: yes, and the FSRM
13:59tacchinotacchi: even if you do that before (and you will) it'll still be an exercise for me
13:59karolherbst: I see
14:00mupuf_: tacchinotacchi: don't worry, no-one is going to still your work, it is not something we need right now, we need good reclocking and good performance to actually overheat a GPU :D
14:00karolherbst: :D
14:00karolherbst: right
14:00tacchinotacchi: what has envytools got to do with downclocking?
14:01mupuf_: I suggest you read all the doc we have on the FSRM, found in the rnndb/pm/therm.xml file
14:01mupuf_: oh, and you are in luck, I have written a paper on this
14:01mupuf_: more doc
14:01tacchinotacchi: that trello entry refers to activating an already done reclocking function when the card overheats, i don't have to communicate with the card
14:01tacchinotacchi: i'm too tired now, imma watch tbbt
14:02tacchinotacchi: get back to you tomorrow if i have time
14:02karolherbst: mupuf_: how does this hw based downclock on overheat works in the end then? Is there some saved emergency clock level the card is puts itself into when a special temp threshold was hit?
14:03mupuf_: tacchinotacchi: http://phd.mupuf.org/files/xdc2013-nvidia_pm.pdf <-- section IV-C and D
14:03mupuf_: karolherbst: Ah ah, you really never read what I sent you :D
14:03mupuf_: they do not reclock any PLL
14:03mupuf_: they basically have a divider by a power of two on the clock
14:03mupuf_: that you can set
14:04karolherbst: ahhhh
14:04mupuf_: and then, you can select how long you want to spend on the divided clock vs the normal clock
14:04karolherbst: so some kind of emergency switch :D
14:04mupuf_: yes
14:04karolherbst: then it start to make sense
14:04mupuf_: read the damn section C and D and be done with it, it is just one page
14:04karolherbst: but what about the dynamic reclocking code in the meantime?
14:06mupuf_: I am reviewing your patches now
14:07tacchinotacchi: meanwhile
14:07tacchinotacchi: i remember once nouveau would expose a thermal sensor
14:07tacchinotacchi: i just loaded nouveau and can't see it anymore
14:07karolherbst: tacchinotacchi: it still does
14:07karolherbst: at least for me
14:07karolherbst: tacchinotacchi: what does "sensors" give you?
14:08tacchinotacchi: http://pastebin.com/iLNm4Bwb
14:08tacchinotacchi: it doesn't change if i load nouveau
14:08karolherbst: weird
14:08mupuf_: tacchinotacchi: well well well
14:09mupuf_: so, what chipset do you have?
14:09tacchinotacchi: i see
14:09tacchinotacchi: the card is shut down
14:09mupuf_: ah, that may explain, indeed
14:09tacchinotacchi: optimus
14:09mupuf_: runpm=0 will help
14:09mupuf_: but it is bad for the battery life
14:09tacchinotacchi: imma read the nouveau page and turn it on
14:10karolherbst: tacchinotacchi: you may want to power on the card before loading nouveau
14:10karolherbst: otherwise nouveau might not find your gpu
14:11karolherbst: I have the same problem, so I alway power the gpu on before loading nouveau
14:11tacchinotacchi: i thought nouveau had control of power managment
14:11tacchinotacchi: ok i'll do
14:11karolherbst: well it works for me the first time
14:11tacchinotacchi: i'll load bbswitch, turn it on, unload bbswitch and load nouveau
14:12karolherbst: but after that, the pci config is a bit screwed
14:12mupuf_: tacchinotacchi: the paper is not really up to date, but I checked the sections C and D, they are OK
14:12imirkin: bumblebee messes everything up... you have to be very careful when combining it with nouveau.
14:12mupuf_: E is almost fully understood now
14:12mupuf_: and almost implemented
14:12karolherbst: imirkin: though this is a bbswitch issue, and the pci device class get screwed up for whatever reason :/
14:13tacchinotacchi: shit
14:13tacchinotacchi: nouveau crashed Xorg
14:13karolherbst: well well well
14:14tacchinotacchi: but now it's loaded
14:14karolherbst: I know that
14:14tacchinotacchi: anyway thank you for the paper
14:14karolherbst: tacchinotacchi: this bug: https://bugs.freedesktop.org/show_bug.cgi?id=91388
14:14tacchinotacchi: you could have told me before :D
14:15karolherbst: I forget about it
14:15karolherbst: I have a little patch to prevent it though
14:15karolherbst: if you want to compile your X server yourself, I will gladly give it to you :D
14:15karolherbst: but now the temperatur should show up
14:16tacchinotacchi: yay
14:17tacchinotacchi: the card is still shut down, but nouveau's got the control over it's power state
14:17tacchinotacchi: so sensors shows the nvidia card, but the temp is N/A
14:17mupuf_: ack
14:17mupuf_: karolherbst: why did you write the first patch?
14:18mupuf_: https://github.com/karolherbst/nouveau/commit/01704fe6cc760d39e64825d087dc325a7458d387 --> I do not see any difference with gf119;s
14:18karolherbst: mupuf_: because kepler has PCOPY2
14:18karolherbst: ohh wait
14:19karolherbst: mupuf_: the #define GK104 0xe4 thing should have been enough, shouldn't it?
14:19mupuf_: no
14:19tacchinotacchi: ok started glxgears, got card temp, all OK
14:20tacchinotacchi: except it's running at 200 mhz
14:20karolherbst: ahh right, no, it is fine then
14:20karolherbst: I was worried for a second I did something useless :D
14:20karolherbst: no, I need to check against GK104 at some points
14:20mupuf_: karolherbst: please explain why you make the commit though, because it was a big WTF otherwise
14:20karolherbst: k
14:25tacchinotacchi: btw let's say there's a chip for which reclocking works
14:25tacchinotacchi: is it still manual? or it depends on load?
14:26karolherbst: tacchinotacchi: manual, but we are working on it currently
14:26karolherbst: will take time though
14:26karolherbst: mupuf_: If I make it b8 I get allignement warnings/errors, what should I do about those?
14:27tacchinotacchi: ok thank you all
14:27mupuf_: karolherbst: are you doing a st b32 ?
14:28karolherbst: no, in the data segment itself
14:28karolherbst: in the end I had b8 on kepler for all
14:28karolherbst: and b16 for perf_eng_mc on earlier
14:28karolherbst: I meant fermi
14:29mupuf_: maybe it is because you need to add an align instruction at the end of the data segment
14:29mupuf_: that is weird though
14:29karolherbst: on tesla I have only three fields
14:29karolherbst: 3x 8bit
14:29karolherbst: and it only complained there
14:30karolherbst: "stdin:198.1-198.16: Warning: Unaligned data .b32; section 'gt215_pmu_data', offset 0xd13"
14:31mupuf_: align .b32 or something like that after your 8 bits values
14:31mupuf_: it will automatically insert the necessary padding
14:34karolherbst: k
14:35mupuf_: would be nice if there were a method that would return the format of your output
14:35mupuf_: but it's ok I guess
14:36mupuf_: well, the entire thing looks pretty good, good job!
14:39karolherbst: thanks
14:40mupuf_: now, I think I will just go to bed and try to adjust my sleep schedule a tiny bit :D
14:40karolherbst: :D
14:47hakzsam: imirkin, Hi, I'm back for ... two minutes. :) I don't have time to have a look at your "upload-time fixups" patch right now (even if I'm really interested in), but I'll have a look tomorrow or so.
14:52imirkin: hakzsam: don't worry about it, i don't really expect meaningful reviews for the nouveau patches i send
14:53imirkin: hakzsam: if you like, i can explain what's going on
14:53imirkin: hakzsam: also, i probably have a new version of that patch i haven't posted... check my tree, master branch
14:53imirkin: hakzsam: allllso, i'm trying to get it all (mostly) working on nv50, but running into some trouble. that code is nasty.
14:56karolherbst: mupuf_: .align 2 helps :/
14:56mupuf_: karolherbst: do .align 4
14:56karolherbst: okay, this is also good
14:56karolherbst: same code
14:58karolherbst: mupuf_: should I remove these polling interval stuff?
14:59karolherbst: I don't know if there is a _good_ reason to ever change it
15:00hakzsam: imirkin, okay, we'll discuss about it tomorrow then. I really need to go to my bed because I have to wake up in less than 6 hours. see you!
15:06imirkin: karolherbst: hey, mind tracing a few piglits for me on blob + kepler?
15:07imirkin: karolherbst: the ones that airlied added in this branch: http://cgit.freedesktop.org/~airlied/piglit/log/?h=arb_bindless_texture
15:07karolherbst: what shall I do?
15:10karolherbst: imirkin: give me the command I shall run and I will
15:10imirkin: fetch airlied's branch
15:10imirkin: and build the new tests
15:10imirkin: then do ...
15:10karolherbst: already built it
15:11imirkin: bin/arb_bindless_texture-bindless-texture -fbo -auto
15:11imirkin: (a) does that pass?
15:12imirkin: oh heh, looks like it doesn't actually test anything. great. anyways, mmt log of that would be great
15:12karolherbst: http://filebin.ca/2KJ2kPJ2SKZY/arb_bindless_texture.mmt.xz
15:12airlied: yeah I just noticed it doesn't pass
15:12karolherbst: :O
15:12airlied: it show draw a nice checkboaard
15:12karolherbst: it passes here
15:12airlied: or it doesn't test rather
15:13karolherbst: with the blob
15:13airlied: does it draw?
15:13karolherbst: yes
15:13airlied: drop the -fbo -auto that is
15:13karolherbst: 4 squares
15:13karolherbst: collors
15:13airlied: cool
15:13airlied: that should let imirkin work things out the
15:13airlied: then
15:14karolherbst: blue white
15:14airlied: karolherbst: does the layout one pass?
15:14karolherbst: red green
15:14karolherbst: what shall I run?
15:14karolherbst: and which branch :D
15:14airlied: bin/arb_bindless_texture-layout -fbo -auto
15:14airlied: same one
15:15karolherbst: nope
15:15karolherbst: fail
15:15karolherbst: well
15:15karolherbst: "fail"
15:15karolherbst: Unexpected GL error: GL_NO_ERROR 0x0
15:15karolherbst: Expected GL error: GL_INVALID_OPERATION 0x502
15:15airlied: what text number?
15:15imirkin: 00000028: fc001f06 8013c008 tex p lauto live dfp $r0:$r1:$r2:$r3 t2d $t8 $s0 $r0:$r1 ()
15:15airlied: test number
15:16imirkin: well, that's the non-bindless version
15:16karolherbst: airlied: (Error at /home/karol/Dokumente/repos/piglit/tests/spec/arb_bindless_texture/bindless_texture_layout.c:137) this?
15:16karolherbst: ohh don't know
15:16airlied: karolherbst: it should print failed test x
15:16karolherbst: ohhh
15:16karolherbst: 0
15:16karolherbst: failed test 0
15:16airlied: sounds like nvidia got the default wrong possibly
15:17imirkin: well, it could also be smart and notice that it can just do it this way
15:17airlied: imirkin: it shouldn't be able to be that smart :)
15:17airlied: or rather being that smart will use the CPU you are trying to save by having bindless :)
15:18imirkin: what it's saying is that it's using a fixed "8" index into the (texture, sampler) binding list
15:18airlied: I should probably dump the handle in the test
15:18airlied: karolherbst: can you printf texhandle in tests/spev/arb_bindless_texture/bindless_texture.c
15:19airlied: and rebuild/rerun the squares test
15:19karolherbst: when?
15:19airlied: after it gets assigned
15:20airlied: imirkin: that's fixed at compile time?
15:20airlied: maybe the binding list has different semantics or something
15:21karolherbst: airlied: for whatever reasons it doesn't print :/
15:21karolherbst: printf("%li\n", texhandle); after texhandle = GetTextureHandle(texid);
15:23imirkin: forgot to rebuild?
15:23karolherbst: I rebuilt
15:24karolherbst: "[ 34%] Linking C executable ../../../../../bin/arb_bindless_texture-bindless-texture"
15:24imirkin: forgot to save? :)
15:24karolherbst: ohh wait
15:24karolherbst: this is another test
15:25karolherbst: mhhh
15:25karolherbst: airlied: maybe you got me the wrong file?
15:26karolherbst: airlied: maybe you meant tests/spec/arb_bindless_texture/bindless_texture_layout.c ?
15:26karolherbst: 4294969856
15:26karolherbst: got this
15:27imirkin: interesting. 0x100000a00.
15:27imirkin: i see that address uploaded
15:27imirkin: PB: 0x00000020 GK104_3D.CB_POS = 0x20
15:27imirkin: PB: 0x00000a00 GK104_3D.CB_DATA[0] = 0xa00
15:27imirkin: PB: 0x00000001 GK104_3D.CB_DATA[0] = 0x1
15:28imirkin: and 0x20 corresponds to $t8
15:28imirkin: neat.
15:30imirkin: bbbbuuuttt... i don't see anything get stored there.
15:34imirkin: gah. this indirection is killing me.
15:35airlied: imirkin: so maybe it does some math on the hnadle
15:36imirkin: that's not an address
15:36imirkin: it just looks like one
15:36imirkin: in actuality it's the (sampler, texture) values
15:37imirkin: so it does the remapping from handle -> texture as well
15:38airlied: so it must just give out handles that are texture, sampler combined, and when you make things resident it puts the values in the correct place
15:39imirkin: yep
15:39airlied: sounds like the gallium interface should be fine for that as well
15:40imirkin: wellllll
15:40imirkin: yeah probably
15:40imirkin: as long as i can rely on some sort of consistent mapping between the argument to the TEX instruction
15:40imirkin: and ... the resident thing
15:40imirkin: or something
15:41airlied: well the constant value is the resident thing handle
15:41airlied: the args to the TEX are just going to be CONST[x]
15:41imirkin: but what is x?
15:42imirkin: i.e. can i use it for something useful?
15:42airlied: offset into the standard constant buffer
15:42imirkin: bleh that won't work
15:42airlied: which you read from to get the handle
15:42imirkin: too much indirectly
15:42imirkin: indirection*
15:42airlied: yes welcome to bbindless
15:43airlied: it's all about indirection
15:43imirkin: well, nvidia blob seems to be able to get rid of a lot of it
15:43airlied: well they pass the handle in a const buffer don't they?
15:43imirkin: that's how texturing is done in the first place
15:43imirkin: and it's a special constbuf
15:43imirkin: one dedicated for texture handles
15:44airlied: oh so we would need a new BINDLESSCONST buffer
15:44airlied: at gallium instead of using CONST[x]
15:44imirkin: right.
15:44imirkin: i mean, it's a regular constbuf
15:44airlied: and GLSL level also
15:44imirkin: but
15:44imirkin: it has to be predeclared
15:44imirkin: sticking it into a UBO would work just fine
15:45airlied: but you need the index into the constbuf to be meaningful
15:45imirkin: PB: 0x80020982 GK104_3D.TEX_CB_INDEX = 2
15:45imirkin: and then you do something like this:
15:45imirkin: PB: 0x00000021 GK104_3D.CB_BIND[0] = { VALID | INDEX = 2 }
15:45imirkin: (which binds a particular address to cb2 for a particular shader stage)
15:45airlied: I should write a test mixing a bunch of bindless types
15:46imirkin: and then you just upload handle pairs into that cb
15:46imirkin: and the tex instruction reads them
15:46airlied: I suppose on radeonsi we don't have limits on const buffers anymore
15:46imirkin: so it can be a perfectly generic constbuf, but it caon't be the global constbuf
15:46airlied: so it would work there as well
15:46imirkin: i.e. if you have st/mesa declare a UBO and stuff the handle values into it, that'd be perfectly fine for nvc0
15:46airlied: muprobably should have this converstaion in front of marek :)
15:47imirkin: he tends to ask questions i don't have answers to :)
15:47airlied: I'll finish the API bits off anyways, and we can look at that then
15:48airlied:hotel food time &
15:49karlmag:perks up, then realizes that he's not at a hotel..
15:49imirkin: i don't think airlied delivers
15:49karlmag: bummer
16:00pmoreau: Ha! stack_addr + user_param[0] == local_addr
16:01pmoreau: Finally managed to get some use of the pointer stored in use_param[0]
16:02pmoreau: But… Is the stack in shared memory? Cause IIRC, local isn't.
16:02imirkin: pmoreau: stack, or s[]?
16:02imirkin: s[] = shared
16:03imirkin: afaik there's no way to access the stack memory [easily]
16:04pmoreau: I am getting a pointer from shared, but I don't find any mention of shared in the mmt trace, apart from its size
16:04imirkin: i forget exactly how it works
16:04imirkin: iirc you specify a window of gmem that acts as shared memory
16:04imirkin: [and has to be accessed via s[] ]
16:06pmoreau: Oh wait, I'm mixing things: the pointer I get from s[] doesn't have to be local to shared! :D
16:08imirkin: the pointer you get out of s[] is to gmem
16:10pmoreau: gmem is global memory, right?
16:10imirkin: sssort of
16:10imirkin: it's a 32-bit window into regular VM memory
16:11pmoreau: :/
17:25imirkin: pmoreau: also there's some sort of index -- i think you can have up to 15 separate windows into the gpu vm
18:21airlied: imirkin: I think you can also pass the handles via vertex attributes
18:22airlied: otherwise I'm not sure why VertexAttribL1ui64ARB is required
18:23imirkin: err.... how would they make it to the shader?
18:25imirkin: oh i see
18:26imirkin: any sampler type(uvec2) // Converts a pair of 32-bit unsigned integers to
18:26imirkin: // a sampler type
18:26airlied: you can have sampler inputs
18:26imirkin: i'll have to trace code that does that :)
18:26airlied: I'll have to write code to do it :)
18:33imirkin: i wonder if there's a way to load the tic/tsc numbers directly
21:27Dan39: hi \o
21:28Dan39: had a game running on 1 monitor (the dark mod) and was looking at irc on other. i clicked a link and soon as firefox popped up everything froze. got these errors in dmesg: http://dpaste.com/3N4PV0T eventually it restarted, after a few attemps to kill -9 Xorg which did nothing :|
21:28Dan39: i fear my gfx card may need another reflow... any comment based on the errors?
21:31Dan39: tbh, the dmesg output colors are quite nice, so hears a screenshot: http://i.imgur.com/u4weiZP.png
21:32Dan39: and im sure someone will enjoy that being their team's colors :P
21:32imirkin: the new dmesg builds like to do that... not a big fan myself
21:33imirkin: those errors aren't overly specific as to what's going on... those invalid_cmd things are errors that pop up every so often for various people, no idea why
21:33Dan39: i never get them
21:33Dan39: those messages are all from when it froze
21:34Dan39: the rest of logs from several days has no nouveau messages
21:34Dan39: and i was surprised i couldnt even switch to a console with ctrl+alt+F1
21:35imirkin: cmd submission was probably hosed
21:35imirkin: surprised it recovered
21:35Dan39: yea
21:35Dan39: i think X and wm restarted completely, it went back into lightdm
21:35imirkin: yeah, often these sorts of situations lead to overall hangs
21:37Dan39: the part that i am interested about is "trapped read at .."
21:37Dan39: but i have absolutely no idea what any of it means
21:37imirkin: that's most likely a result of bogus command submission
21:39Dan39: got this in xsession-errors...
21:39Dan39: XIO: fatal IO error 4 (Interrupted system call) on X server ":0"
21:39Dan39: after 441113 requests (441113 known processed) with 0 events remaining.
21:39Dan39: xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0"
21:39imirkin: coz X exited
21:39imirkin: all running clients got the shaft
21:40Dan39: heh
21:40Dan39: i see now, xterm
21:40Dan39: and whatever XIO is
21:41Dan39: thanks for the info
21:41Dan39: will idle but g2g. peace
23:35imirkin: gnurou: so does upstream nouveau work with the TX1? [with your patches]
23:36imirkin: i assume that there's a bunch of shader work that needs to be completed for that...
23:39gnurou: imirkin: you're correct on both points
23:39gnurou: imirkin: my priority now is to enable secure boot on dGPU, then we can concentrate on shader issues
23:40gnurou: hopefully we can get some cooperation from the people who know
23:40imirkin: cool
23:40gnurou: but getting FECS/GPCCS signed firmwares to load is on the way to anything else anyway
23:40imirkin: sure
23:40gnurou: I hope you guys will like this version of secure boot... it took a while to put together
23:41imirkin: well, it's all up to skeggsb ... i don't really understand any of that stuff
23:42gnurou: it's down to ~1750 lines now, and almost reads like a story book, so hopefully he will like it :P
23:43gnurou: we could probably make things simpler, but we have little control over the format of the different firmwares, and I prefer to use them as-is if possible
23:44gnurou: for instance, PMU firmware comes from different people, and in a different format than FECS/GPCCS, requiring a different loading function... I'm trying to find a way around this, but everything may not end up completely consistent
23:47imirkin: what does the PMU firmware do again? reclocking?
23:48gnurou: imirkin: on GM20B I am not sure it does that yet - I suppose we will need to document its interface when we release it officially anyway
23:54imirkin: gnurou: what is it good for?
23:57gnurou: imirkin: giving my headaches, apart from that I'm not too sure... :P
23:57gnurou: I'm sure mupuf_ knows more about it than I do
23:58imirkin: hehehe
23:58imirkin: i'm sure the headaches bit is mentioned as the main purpose in its design doc :)
23:58gnurou: but apparently it it involved with some host-triggered power-management functions, like ELPG
23:59gnurou: therm and fan control as well it seems?
23:59gnurou: from what I know more and more PM functions will be moved to it
23:59mupuf_: imirkin: pdaemon does power gating, load monitoring, fan management
23:59mupuf_: hw scheduling (executing scripts with the card off bus)
23:59mupuf_: it also talks to the power sensor
23:59imirkin: right, i knew about the reclocking bit, wasn't 100% sure about the rest