14:02 mardination: what i think, if anyone had some doubts than still the theoretical base win on some the technical dilemmas goes to me, however i am not made of steel, i also do mistakes, but i commented some of it short, from philosophy to implementation details is rough go
14:03 mardination: if someone wants to talk about implementation details, we can do that, but this requires you to do what i have wished
14:04 mardination: i.e put the nose into miaow code and if there are questions and i can answer them, i may comment or we may comment together.
14:07 mardination: having doubts on people skillset or qualification or doubts of code that it is not as good as philsophy learned from papers, that it works so...round about
14:08 mardination: this all can be cured with taking the full pot on, and going throough it line by line, which i also did
14:12 mardination: in general I did not belive designers who are paid 5k or more per month to make hardware design, that they are underqualified, in general i round about belived that all works , time to time you get this question that how, and than exactly how
14:17 mardination: i have not yet read all the TCE TTA vliws, but miaow is very good in how it does things, it is the best that i have seen so far -- i think tce tta are good as well, and synposis i have not looked, it is commercial, maybe they used cadence/synopsis generator there, my understandings are i did not break any law like always
14:17 mardination: the code was published with MIT license
16:10 mardination: generally or generically summing this up, is i picked a central most important file, it is all subjective, it is fetch/round_robin.v this file feeds f_decode_wfids to the chip, it also centrally fetches opcodes from pcblocks
16:12 mardination: it based of the vacant wfid entries, which is...vacant_wr_reg=instr_done_en | wr
16:13 mardination: that means if the dispatcher does not register a dispatch , cause wr=wf_dispatch_i
16:13 mardination: the whole chip can not feed valid bits to the decoder and issuee module via wavepool
16:14 mardination: so in other words, decode_wr_data is in two states the wr port for the read muxes, in instr_info_table.v
16:14 mardination: depending if the chip fed some valid entries f_decode_valid or not
16:15 mardination: if dispatcher is clamped to fail , no valid entries will be fed, and chip takes previous issued_rd_data from the flops
16:16 mardination: however when valid entries are feed, it takes the current fetched and decoded instruction using the full pipeline length, where the earlier can be managed with lsu
16:16 mardination: now, the detail that i allready also mentioned before was.
16:17 mardination: if the stream meets a dependency
16:17 mardination: meaning that for whatever reason instruction can not be issued
16:17 mardination: than the instructions decode instance will be leaving valid bits on
16:19 mardination: during such a modent, it feeds the alus directly from the decode module issue_opcode, without even going through the
16:19 mardination: *moment, going through the issue flops, but only scoreboard flops
16:20 mardination: in other words, issue_opcode and it's brothers have two drivers
16:20 mardination: but one of them is not driven conditionally
16:20 mardination: this in verilog means a tri-state buffer technique
16:21 mardination: to avoid transistor damage whatever because of multiple drivers for the same wire/net or port
16:27 mardination: anyways this method is very clever, and definitely one good and right way to do it
16:36 mardination: why it carries some importance, because it is the fast fallback path for one method, to get additional slots
16:37 mardination: cause now fetch queue content passes through the issue module basically, so it is very fast
16:38 mardination: theoretically there are two methods, one is pinning the opcodes to different flops and lsuing there by parsing the opcode from registers and data cache
16:45 mardination: I implement all that code if needed, since I have talked about this several time, and for some reason you yet have not understood
16:54 mardination: for me some of the issues are with remembering the tessellation and geometry shader pipeline at the moment
16:55 mardination: shader stages only, but this method can be also done in generic ways which is suitable for every shading stage
17:00 mardination: very long book can be cooked together based of miaow, to help people to understand the code, but heck i do not have resources or momentum to fill this out
18:16 mardination: i have some more tests to do, testbenches, to yet reconfirm things, I'm not always entirely sure what i read :) lucky guesses
18:19 mardination: there is some percentage of accuracy in my views, not always entirely sure, even though i think, if we were to discuss about miaow, i do not promise anything to be accurate , books would also probably neeed to undergo some reviews and mistakes fixed etc.
18:39 mardination: i am pretty sure i know how that hw works but if i explain it right is another matter.